Hi Andrea,
Ok, I have a better idea now what you want. The situation is as follows: 0.8 
rc5 has automated and integrated nicely a lot of workflows needed to create a 
new mart from a source schema. However the particular Ensembl core 
transformation is a very complex one and rc5 still has only a rudimentary 
support for that. If you just want to have an idea how the algorithm works it 
is better to start with a simpler use case not Ensembl mart . It will be 
difficult to  recreate the exact Ensembl mart transformation from scratch for 
two reasons: rc5 has still a very rudimentary support for this pariticular 
schema so you will not get far. The 0.7 fully supports it but thereis  a large 
number of 'tweaks aka hacks' to the transformation algorithm to get certain 
things to work so you will find difficult to recreate a lot of them.
I would advise you to play with any schema to get a few datasets to work 
(ensembl core schema is fine to play with too). If you want to build new marts 
and integrate them with ensembl definitely go for rc5 and treat the existing 
ensembl  mart as a black box, the software will provide the means to integrate 
it nicely through a backwards compatibility mechanism with your newly created 
mart.This is much easier in 0.8 rc5. If you however just want to see how the 
ensembl mart transformation is achieved exactly you will need a 0.7 XML 
transformation file from the Ensembl team.

FYI: A short description of the basic transformation algorithm below:

starting from one or more input “candidate” table, the software finds the 
largest set of table joins it can perform using only 1:1 and many-to-one (M:1) 
relations, and merges these tables together to create  the main table. Multiple 
candidate tables can be given as input, in which case the algorithm creates 
main tables out of each selected candidate table and if unable to do so will 
create several separate datasets. Once the main tables are completed, if there 
is a 1:M relation between them they become main and sub-main tables. If there 
is now 1:M relation between them, they are split into separate datasets. Any 
tables that have a 1:M or many-to-many (M:N) relation with the newly-created 
main table or sub-main table are made into independent dimension tables.


Please let us know if you have any more questions,
a
Arek Kasprzyk
Director, Bioinformatics Operations and Principal Investigator

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel:       416-673-8559
Toll-free:           1-866-678-6427
www.oicr.on.ca

Administrative Assistant: [email protected]

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.

From: Andrea Edwards <[email protected]<mailto:[email protected]>>
Date: Sat, 12 Mar 2011 14:13:24 -0500
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] how to get from ensembl main database schema to 
ensembl mart schema

Hi

I am not concerned whether I use biomart 0.7 or 0.8 - whichever is
easiest for what I would like to do. I havent done anything yet and I'm
starting from scratch.

All i want to do is have a go at re-creating the ensembl mart from the
ensembl core databases. I wanted to do this because ensembl is an
example of a database whose schema I am familiar with and whose mart I
have used. I wanted to do this for 2 reasons:
a) to get some practice
b) to get an intuitition of what type of mart I can create from my own
database schema and what types of query I can run and what the
filters/attributes will be
c) get an idea of how i could integrate my database with ensembl as I
believe they only need to share ids or underlying assembly to b integrated

Will i be able to recreate the ensembl mart in biomart 0.8? I presume
the ensembl xml files are available for 0.7 and I won;t be able to read
them in 0.8? Without these files how will i know the exact steps ensembl
used to specify their mart structure? How will i know what main tables
they chose or how for example they created the PRINTS dimension table
mentioned in my original query?

Thanks a lot


On 12/03/2011 18:59, Arek Kasprzyk wrote:
Putting this back on the list to keep everyone else in the loop

a



On 2011-03-12, at 13:56, "Arek 
Kasprzyk"<[email protected]<mailto:[email protected]>>
wrote:

If you are starting from scratch it would be much better to start with
0.8 rc5. Creating new mart is as simple as choosing one or more main
tables in the source schema. You can choose different tables and
create different datasets. There is some documentation about it in
rc5. If you want to know how the transformation algorithm works I can
describe that to you too


a



On 2011-03-12, at 12:53, "Andrea 
Edwards"<[email protected]<mailto:[email protected]>>
wrote:

ok - thanks

i don't know much about biomart as you can probably tell but i was
told
there are quite significant differences between 0.7 and 0.8.
If i am interested in understanding how the schema transformations
take
place so that I can design my own mart and integrate it with existing
marts, would i be better dropping back to 0.7? I'm keen to get a
mart up
and running very soon.

On 12/03/2011 17:41, Arek Kasprzyk wrote:
0.8 rc 5 has still only rudimentary support for the MBuilder
component. You will not be able to read 0.7 mbuilder XML with it.
(ccing junjun who  has just taken over the coordination of the
BioMart
development to let him know that such discussions are taking place)

a



On 2011-03-12, at 12:28, "Andrea 
Edwards"<[email protected]<mailto:[email protected]>>
wrote:

Brilliant - thanks for such a prompt reply.

I note that you say MBuilder (0.7) whereas i have checked out the
code
for biomart 0.8 rc4


On 12/03/2011 16:39, Arek Kasprzyk wrote:
Hi Andrea
All the transformation information is stored in the XML file that
MBuilder (0.7) uses to compile it's DDL for Ensembl core
databases. I
am sure the ensembl mart team will be happy to provide you the
latest
version

a



On 2011-03-12, at 11:15, "Andrea 
Edwards"<[email protected]<mailto:[email protected]>>
wrote:

Hello

I was wondering if there were any documents showing how the
ensembl
marts were created from the main ensembl databases.
Specifically i
was
hoping there were documents describing what tables were selected
as
main
tables for the marts and how the dimension tables were mapped to
the
main tables.

As an example the ensembl_mart_61 contains a main table for human
named
translation_main (this is an abbreviation of the name but its
obvious
which one i mean) and this has a field called
protein_feature_prints_bool which is essentially a boolean field
indicating whether a protein translation is assocated with a row
in
the
PRINTS dimension table protein_feature_prints_dm. If the
translation
does have a row in this dimension table then I am guessing it
has a
PRINTS domain in it!

The core database itself however has a table called translation
which
represents, well, a translation. Translations are linked to rows
in a
table called 'protein_feature' which in turn has a foreign key
called
analysis_id which links to an 'analysis' table with fields
'database'
and 'program'. So in this schema, a translation is associated
with a
PRINTS annotation if it is linked to a 'protein_feature' record
which is
in turn linked to an 'analysis' record with the text 'PRINTS'
somewhere
in both/either the database/program fields.

I am interested in how the biomart software is configured with
'rules'
to create the mart schema from the database schema. Is there a
configuration file with these rules in that I could look at? Is
there a
worked example? As an academic exercise I'd like to recreate the
ensembl
marts. I have the biomart user manual but even with that document
I do
not know how to recreate the ensembl marts

I am NOT specifically interested in protein domains. I used the
PRINTS
example purely for illustrative purposes as I thought it was a
straightforward example. I am interested in how you specify the
'rules'
to get from a schema to a mart.

thanks a lot

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
https://lists.biomart.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
https://lists.biomart.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to