Hi Andrea, Ok, I have a better idea now what you want. The situation is as follows: 0.8 rc5 has automated and integrated nicely a lot of workflows needed to create a new mart from a source schema. However the particular Ensembl core transformation is a very complex one and rc5 still has only a rudimentary support for that. If you just want to have an idea how the algorithm works it is better to start with a simpler use case not Ensembl mart . It will be difficult to recreate the exact Ensembl mart transformation from scratch for two reasons: rc5 has still a very rudimentary support for this pariticular schema so you will not get far. The 0.7 fully supports it but thereis a large number of 'tweaks aka hacks' to the transformation algorithm to get certain things to work so you will find difficult to recreate a lot of them. I would advise you to play with any schema to get a few datasets to work (ensembl core schema is fine to play with too). If you want to build new marts and integrate them with ensembl definitely go for rc5 and treat the existing ensembl mart as a black box, the software will provide the means to integrate it nicely through a backwards compatibility mechanism with your newly created mart.This is much easier in 0.8 rc5. If you however just want to see how the ensembl mart transformation is achieved exactly you will need a 0.7 XML transformation file from the Ensembl team.
FYI: A short description of the basic transformation algorithm below: starting from one or more input “candidate” table, the software finds the largest set of table joins it can perform using only 1:1 and many-to-one (M:1) relations, and merges these tables together to create the main table. Multiple candidate tables can be given as input, in which case the algorithm creates main tables out of each selected candidate table and if unable to do so will create several separate datasets. Once the main tables are completed, if there is a 1:M relation between them they become main and sub-main tables. If there is now 1:M relation between them, they are split into separate datasets. Any tables that have a 1:M or many-to-many (M:N) relation with the newly-created main table or sub-main table are made into independent dimension tables. Please let us know if you have any more questions, a Arek Kasprzyk Director, Bioinformatics Operations and Principal Investigator Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8559 Toll-free: 1-866-678-6427 www.oicr.on.ca Administrative Assistant: [email protected] This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Andrea Edwards <[email protected]<mailto:[email protected]>> Date: Sat, 12 Mar 2011 14:13:24 -0500 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema Hi I am not concerned whether I use biomart 0.7 or 0.8 - whichever is easiest for what I would like to do. I havent done anything yet and I'm starting from scratch. All i want to do is have a go at re-creating the ensembl mart from the ensembl core databases. I wanted to do this because ensembl is an example of a database whose schema I am familiar with and whose mart I have used. I wanted to do this for 2 reasons: a) to get some practice b) to get an intuitition of what type of mart I can create from my own database schema and what types of query I can run and what the filters/attributes will be c) get an idea of how i could integrate my database with ensembl as I believe they only need to share ids or underlying assembly to b integrated Will i be able to recreate the ensembl mart in biomart 0.8? I presume the ensembl xml files are available for 0.7 and I won;t be able to read them in 0.8? Without these files how will i know the exact steps ensembl used to specify their mart structure? How will i know what main tables they chose or how for example they created the PRINTS dimension table mentioned in my original query? Thanks a lot On 12/03/2011 18:59, Arek Kasprzyk wrote: Putting this back on the list to keep everyone else in the loop a On 2011-03-12, at 13:56, "Arek Kasprzyk"<[email protected]<mailto:[email protected]>> wrote: If you are starting from scratch it would be much better to start with 0.8 rc5. Creating new mart is as simple as choosing one or more main tables in the source schema. You can choose different tables and create different datasets. There is some documentation about it in rc5. If you want to know how the transformation algorithm works I can describe that to you too a On 2011-03-12, at 12:53, "Andrea Edwards"<[email protected]<mailto:[email protected]>> wrote: ok - thanks i don't know much about biomart as you can probably tell but i was told there are quite significant differences between 0.7 and 0.8. If i am interested in understanding how the schema transformations take place so that I can design my own mart and integrate it with existing marts, would i be better dropping back to 0.7? I'm keen to get a mart up and running very soon. On 12/03/2011 17:41, Arek Kasprzyk wrote: 0.8 rc 5 has still only rudimentary support for the MBuilder component. You will not be able to read 0.7 mbuilder XML with it. (ccing junjun who has just taken over the coordination of the BioMart development to let him know that such discussions are taking place) a On 2011-03-12, at 12:28, "Andrea Edwards"<[email protected]<mailto:[email protected]>> wrote: Brilliant - thanks for such a prompt reply. I note that you say MBuilder (0.7) whereas i have checked out the code for biomart 0.8 rc4 On 12/03/2011 16:39, Arek Kasprzyk wrote: Hi Andrea All the transformation information is stored in the XML file that MBuilder (0.7) uses to compile it's DDL for Ensembl core databases. I am sure the ensembl mart team will be happy to provide you the latest version a On 2011-03-12, at 11:15, "Andrea Edwards"<[email protected]<mailto:[email protected]>> wrote: Hello I was wondering if there were any documents showing how the ensembl marts were created from the main ensembl databases. Specifically i was hoping there were documents describing what tables were selected as main tables for the marts and how the dimension tables were mapped to the main tables. As an example the ensembl_mart_61 contains a main table for human named translation_main (this is an abbreviation of the name but its obvious which one i mean) and this has a field called protein_feature_prints_bool which is essentially a boolean field indicating whether a protein translation is assocated with a row in the PRINTS dimension table protein_feature_prints_dm. If the translation does have a row in this dimension table then I am guessing it has a PRINTS domain in it! The core database itself however has a table called translation which represents, well, a translation. Translations are linked to rows in a table called 'protein_feature' which in turn has a foreign key called analysis_id which links to an 'analysis' table with fields 'database' and 'program'. So in this schema, a translation is associated with a PRINTS annotation if it is linked to a 'protein_feature' record which is in turn linked to an 'analysis' record with the text 'PRINTS' somewhere in both/either the database/program fields. I am interested in how the biomart software is configured with 'rules' to create the mart schema from the database schema. Is there a configuration file with these rules in that I could look at? Is there a worked example? As an academic exercise I'd like to recreate the ensembl marts. I have the biomart user manual but even with that document I do not know how to recreate the ensembl marts I am NOT specifically interested in protein domains. I used the PRINTS example purely for illustrative purposes as I thought it was a straightforward example. I am interested in how you specify the 'rules' to get from a schema to a mart. thanks a lot _______________________________________________ Users mailing list [email protected]<mailto:[email protected]> https://lists.biomart.org/mailman/listinfo/users _______________________________________________ Users mailing list [email protected]<mailto:[email protected]> https://lists.biomart.org/mailman/listinfo/users
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
