Hi Andrea,
BioMart has a built in algorithm for 3NF -> mart schema (reverse star)  
transformation (we'll be publishing this algorithm in our paper to be submitted 
shortly). The system requires user input to instruct it which tables should be 
used a  main tables but the rest is done automatically. The system will 
correctly transform any schema complying with 3NF into reverse star.

The virtual mart is definitely quicker to set up but will be slower for 
querying than it's materialized counterpart for large datasets. The virtual 
mart option is recommended for quick prototyping or for small datasets. The 
materialized mart just like materialized view in the relational database offers 
benefits of query optimization. You can create virtual mart from a remote 
server but in order to materialize it both source schema and materialized mart 
will have to be on the same server. (the materialization process relies on the 
DDL statements that involve both source and materialized and need to be 
executed on the same server).

Hope this helps,
a




Arek Kasprzyk
Director, Bioinformatics Operations and Principal Investigator

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel:       416-673-8559
Toll-free:           1-866-678-6427
www.oicr.on.ca

Administrative Assistant: [email protected]

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.

From: Andrea Edwards <[email protected]<mailto:[email protected]>>
Date: Fri, 4 Mar 2011 14:34:02 -0500
To: Joachim Baran <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] installing biomart and memory requirements

Thanks for your reply. I have a few more questions which will probably
be obvious once I've made my own mart but for now are puzzling me.

How does the system 'know' how to rewrite the schema? For example, how
does it know which tables to use as the central 'fact' tables (i read
them called focus tables in an old ensmart paper). I'm wondering how it
is possible for any database schema to be compatible. This might be
obvious when you have seen it done. I'd like to be in a position where ,
if a new database is published, i can be sure i can add it to my
existing mart regardless of its schema. I appreciate this might not be a
simple answer but a concrete example would be really useful if possible.

Is a physical mart quicker than a virtual mart? I presume that is the
benefit of materializing over not materializing
Do i have to have a virtual mart if i use a machine on a remote server
or can materialization get the data from the remote database

If it does do all of these things then I'll definitely be using it!

On 04/03/2011 16:52, Joachim Baran wrote:
Hey!

    I CC the mailing list here, so other people can benefit from the
conversation too. Hope that is alright.

On 11-03-04 11:15 AM, "Andrea 
Edwards"<[email protected]<mailto:[email protected]>>  wrote:
I believe biomart is capable of producing one query-optimized
system from this data. Is this correct?
    Yes. The query-optimised system is generated when you select a source,
right-click, and then select 'Materialize'. This will take your local
data-sources and rewrite them for query optimisation. You do not have to
do that though -- you can run the system with your databases as they are.
This is what we call a 'Virtual Mart'.

Will there be one database that incorporates all this data on my machine
at the end of it?
    If you materialise your databases, all the local data-sources will be
put in one database.

Do all the databases that I wish to incorporate have to be on the same
machine? I'm
guessing not if its a federated data model.
    You can use pointed attributes to incorporate data from other machines.
For example, you could mesh-up your data with Ensembl's marts if your like.

What happens if the schema changes for one of the databases, do i have
to rebuild the whole lot?
    Depends. You need to run 'Update' on the data-source that has changed
(again, right-click on the data-source). By doing so, MartConfigurator
will pick up on changes in the schema, such as added/deleted columns in
your database. If you run a virtual mart, just hit save and re-deploy the
mart. If you run a materialised mart, materialise it again and you are
ready to go.

I also believe biomart has tools for automatically generating a web
interface too.
    Yes. In fact, when you hit deploy in MartConfigurator, a Jetty server
will launch in the background and after a few seconds a browser window
will pop-up pointing to your deployed mart. The web-interface comes "for
free". No need to configure anything, really.

I believe queries on this interface will automatically
generate perl and java code to query the resource directly
    You got Perl + Java in BioMart 0.7. In BioMart 0.8 you can generate Java
from queries, or XML that can be used to run automated queries via our
RESTful interface, and there will be a SPARQL interface soon.

    Perhaps you are more familiar with the old BioMart 0.7 marts. You can
have a look at the new BioMart 0.8 here: http://dcc.icgc.org/

Joachim


_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
https://lists.biomart.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to