Hi Junjun,

The problem here is that the three expression databases they host at MRC-HGU
are from different projects and contain quite different data and have been
rightly configured as separate datasets. Partitioning would not be desirable
here. So the situation is 3 expression datasets. The results you would get
from linking by gene and linking by anatomy will sometimes be very different
but this is a valid use case.

Cheers
Damian


On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang <[email protected]>wrote:

> Hi Damian,
>
> Sorry for the delay.
>
> If I understand you correctly, the current system already supports it. Here
> is what I see it:
>
> You have three data sources:
>   expression (partitioned with multiple datasets, eg, expression1,
> expression2, expression3)
>   gene (single dataset)
>   anatomy (single dataset)
>
> Data source expression is linked with gene dataset via gene_id, and it is
> linked with anatomy via anatomic_term_id.
>
> Now, create a config (config1) of expression and add gene_symbol as a
> pointer filter pointing to gene_symbol filter in gene dataset. Similarly
> create another config (config2) for expression and add anatomical_term as a
> pointer filter pointing to anatomical_term filter in anatomy dataset.
>
> Finally, the queries:
>
> 1. Give me all results from all the datasets for gene X:
>
> <dataset name="expression1,expression2,expression3" config="config1">
>   <filter name="gene_symbol" value="gene X"/>
>   <attribute name="xxxxxxx"/>
>   <!-- more expression attributes here -->
> </dataset>
>
> 2. Give me all results from all the datasets for anatomical term X:
>
> <dataset name="expression1,expression2,expression3" config="config2">
>   <filter name="anatomical_term" value="term X"/>
>   <attribute name="xxxxxxx"/>
>   <!-- more expression attributes here -->
> </dataset>
>
> Both queries will return the union of results from three expression
> datasets: expression1, expression2, expression3. For the queries to work
> properly, it's not needed for the link to be config specific. When a pointer
> filter is picked up in the query, BioMart query engine will be able to pick
> up the correct link to perform the join.
>
> These queries are similar to the following one which is a real query from
> the ICGC data portal. This query gives you the methylation results of two
> cancers for genes involved in 'Apoptosis' pathway.
>
> <Dataset
> name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD"
> config="gene_ensembl_config">
>
> <Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer
> filter from pathway dataset, pathway and gene datasets are linked via
> ensembl_gene_id -->
>
> <Attribute name="cancertype"/>
>
> <Attribute name="ensembl_gene_id"/>
>
> <Attribute
> name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/>
>
> <Attribute
> name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/>
>
> <Attribute
> name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/>
>
> </Dataset>
>
> Let me know if that makes sense or I just completely missed the point.
>
> Cheers,
> Junjun
>
>
> From: Damian Smedley <[email protected]>
> Date: Thu, 26 May 2011 09:14:55 -0400
> To: "[email protected]" <[email protected]>
> Subject: [BioMart Users] setting up multiple links between the same
> datasets
>
> Hi,
>
> Just helping set up some new expression database BioMarts. We want to have
> one config where the datasets are linked by gene for the use case "Give me
> all results from all the datasets for gene X".  But we also want to have
> another config linked by anatomical term to satisfy the query "Give me all
> results from all the datasets for anatomical term X"
>
> But linking seems to be set up at the dataset rather than config level? Is
> there a way round this?
>
> Thanks
> Damian
>
>
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to