Hi Junjun, The problem here is that the three expression databases they host at MRC-HGU are from different projects and contain quite different data and have been rightly configured as separate datasets. Partitioning would not be desirable here. So the situation is 3 expression datasets. The results you would get from linking by gene and linking by anatomy will sometimes be very different but this is a valid use case.
Cheers Damian On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang <[email protected]>wrote: > Hi Damian, > > Sorry for the delay. > > If I understand you correctly, the current system already supports it. Here > is what I see it: > > You have three data sources: > expression (partitioned with multiple datasets, eg, expression1, > expression2, expression3) > gene (single dataset) > anatomy (single dataset) > > Data source expression is linked with gene dataset via gene_id, and it is > linked with anatomy via anatomic_term_id. > > Now, create a config (config1) of expression and add gene_symbol as a > pointer filter pointing to gene_symbol filter in gene dataset. Similarly > create another config (config2) for expression and add anatomical_term as a > pointer filter pointing to anatomical_term filter in anatomy dataset. > > Finally, the queries: > > 1. Give me all results from all the datasets for gene X: > > <dataset name="expression1,expression2,expression3" config="config1"> > <filter name="gene_symbol" value="gene X"/> > <attribute name="xxxxxxx"/> > <!-- more expression attributes here --> > </dataset> > > 2. Give me all results from all the datasets for anatomical term X: > > <dataset name="expression1,expression2,expression3" config="config2"> > <filter name="anatomical_term" value="term X"/> > <attribute name="xxxxxxx"/> > <!-- more expression attributes here --> > </dataset> > > Both queries will return the union of results from three expression > datasets: expression1, expression2, expression3. For the queries to work > properly, it's not needed for the link to be config specific. When a pointer > filter is picked up in the query, BioMart query engine will be able to pick > up the correct link to perform the join. > > These queries are similar to the following one which is a real query from > the ICGC data portal. This query gives you the methylation results of two > cancers for genes involved in 'Apoptosis' pathway. > > <Dataset > name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD" > config="gene_ensembl_config"> > > <Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer > filter from pathway dataset, pathway and gene datasets are linked via > ensembl_gene_id --> > > <Attribute name="cancertype"/> > > <Attribute name="ensembl_gene_id"/> > > <Attribute > name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/> > > <Attribute > name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/> > > <Attribute > name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/> > > </Dataset> > > Let me know if that makes sense or I just completely missed the point. > > Cheers, > Junjun > > > From: Damian Smedley <[email protected]> > Date: Thu, 26 May 2011 09:14:55 -0400 > To: "[email protected]" <[email protected]> > Subject: [BioMart Users] setting up multiple links between the same > datasets > > Hi, > > Just helping set up some new expression database BioMarts. We want to have > one config where the datasets are linked by gene for the use case "Give me > all results from all the datasets for gene X". But we also want to have > another config linked by anatomical term to satisfy the query "Give me all > results from all the datasets for anatomical term X" > > But linking seems to be set up at the dataset rather than config level? Is > there a way round this? > > Thanks > Damian > >
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
