In the context of a large database system with very complex schemas, it is
often error-prone to construct a multi-table join SQL query statement.  For
instance, in the following case, in order to retrieve all the translate-able
exons for a given translation (translation_id = 15121), we need to write a
four-way join statement:


+----------------+---------------+-----------+---------------+---------+-------------+
| translation_id | transcript_id | seq_start | start_exon_id | seq_end |
end_exon_id |
+----------------+---------------+-----------+---------------+---------+-------------+

|          15121 |         15960 |         7 |         95160 |      75
|       95172 |
+----------------+---------------+-----------+---------------+---------+-------------+


mysql> select * from exon_transcript where transcript_id in (15960) order by
rank;
+---------+---------------+------+
| exon_id | transcript_id | rank |
+---------+---------------+------+
|   95144 |         15960 |    1 |
|   95152 |         15960 |    2 |
|   95160 |         15960 |    3 |
|   95020 |         15960 |    4 |
|   95035 |         15960 |    5 |
|   95050 |         15960 |    6 |
|   95059 |         15960 |    7 |
|   95069 |         15960 |    8 |
|   95081 |         15960 |    9 |
|   95088 |         15960 |   10 |
|   95101 |         15960 |   11 |
|   95110 |         15960 |   12 |
|   95172 |         15960 |   13 |
+---------+---------------+------+

sql_statement = 'SELECT t3.exon_id FROM translation AS tr, exon_transcript
AS t1, exon_transcript AS t2, exon_transcript AS t3 WHERE tr.translation_id
== 15121 AND tr.transcript_id == t1.transcript_id == t2.transcript_id ==
t3.transcript_id AND t1.exon_id == tr.start_exon_id AND t2.exon_id ==
tr.end_exon_id AND t3.rank >= t1.rank AND t3.rank <= t2.rank ORDER BY
t3.rank'

I recently discussed this problem with Chris and we agreed that it would be
better to construct the above query only once and then save the complex
relationship between translation and exons into pygr.Data namespace:

translationExons = myMapper(sourceDB = ensemblTranslation, targetDB =
ensemblExon, sql_statement)

# save translationExons to pygr.Data

Then, at the application-code level, all we need to do is just the following
lines of code:

# get the translationExons mapper from pygr.Data
translationExons =
pygr.Data.Bio.Annotation.Ensembl.homo_sapiens_core_47_36i.translationExons()

for exon in translationExons[translation]:
    # do something ...
    print exon.seq_region_start, len(exon.sequence)
    ...

I really like this solution, but I am also open to other suggestions.  I'd
love to hear other possible elegant ways to simplify the multi-table join
complications, such as using SQLAlchemy



Thanks,

Jenny

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to