Hi Junjun, Thanks for the tips.
Looks like for the anatomical term query either a MartReport solution or just creating a simplified combined partitioned dataset with the common fields may be the way to go then. Cheers Damian On Mon, May 30, 2011 at 4:36 AM, Junjun Zhang <[email protected]>wrote: > Hi Damian, > > OK, I see. So you basically want to apply the same filter (eg, > anatomic_term or gene) to three different expression datasets and retrieve > results from all of them. It makes sense if this runs as three queries and > returns three separate sets of results. If it were to run as one query where > three datasets are joined, it would be possible to run, but the results > would be very difficult to understand. For example, when applying one > anatomic_term, dataset1 returns 10 rows, dataset2 returns 15 rows, dataset3 > returns 20 rows, when they are joined, it will produce 10X15X20=3000 rows in > the resulting table. > > Other than joining the three datasets, it might be better to utilize > martreport to retrieve data from three datasets and present them on the same > page as different sections. Let's say, we create a anatomy dataset, then > link it to all three expression datasets with anatomic_term (or anatomy_id, > or something). Next, we create a report for the anatomy dataset choosing > anatomic_term as the primary identifier. Last, add three expression datasets > to the report as three different sections. Here is an example of a report > for a gene at the ICGC data portal: > http://dcc.icgc.org/martreport/?report=report&mart=gene_report&ensembl_gene_id=ENSG00000141510 > > Hope this helps, > > Junjun > > > > From: Damian Smedley <[email protected]> > Date: Sat, 28 May 2011 05:22:27 -0400 > To: jzhang <[email protected]> > Cc: "[email protected]" <[email protected]> > Subject: Re: [BioMart Users] setting up multiple links between the same > datasets > > Hi Junjun, > > The problem here is that the three expression databases they host at > MRC-HGU are from different projects and contain quite different data and > have been rightly configured as separate datasets. Partitioning would not be > desirable here. So the situation is 3 expression datasets. The results you > would get from linking by gene and linking by anatomy will sometimes be very > different but this is a valid use case. > > Cheers > Damian > > > On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang <[email protected]>wrote: > >> Hi Damian, >> >> Sorry for the delay. >> >> If I understand you correctly, the current system already supports it. >> Here is what I see it: >> >> You have three data sources: >> expression (partitioned with multiple datasets, eg, expression1, >> expression2, expression3) >> gene (single dataset) >> anatomy (single dataset) >> >> Data source expression is linked with gene dataset via gene_id, and it is >> linked with anatomy via anatomic_term_id. >> >> Now, create a config (config1) of expression and add gene_symbol as a >> pointer filter pointing to gene_symbol filter in gene dataset. Similarly >> create another config (config2) for expression and add anatomical_term as a >> pointer filter pointing to anatomical_term filter in anatomy dataset. >> >> Finally, the queries: >> >> 1. Give me all results from all the datasets for gene X: >> >> <dataset name="expression1,expression2,expression3" config="config1"> >> <filter name="gene_symbol" value="gene X"/> >> <attribute name="xxxxxxx"/> >> <!-- more expression attributes here --> >> </dataset> >> >> 2. Give me all results from all the datasets for anatomical term X: >> >> <dataset name="expression1,expression2,expression3" config="config2"> >> <filter name="anatomical_term" value="term X"/> >> <attribute name="xxxxxxx"/> >> <!-- more expression attributes here --> >> </dataset> >> >> Both queries will return the union of results from three expression >> datasets: expression1, expression2, expression3. For the queries to work >> properly, it's not needed for the link to be config specific. When a pointer >> filter is picked up in the query, BioMart query engine will be able to pick >> up the correct link to perform the join. >> >> These queries are similar to the following one which is a real query from >> the ICGC data portal. This query gives you the methylation results of two >> cancers for genes involved in 'Apoptosis' pathway. >> >> <Dataset >> name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD" >> config="gene_ensembl_config"> >> >> <Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer >> filter from pathway dataset, pathway and gene datasets are linked via >> ensembl_gene_id --> >> >> <Attribute name="cancertype"/> >> >> <Attribute name="ensembl_gene_id"/> >> >> <Attribute >> name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/> >> >> <Attribute >> name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/> >> >> <Attribute >> name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/> >> >> </Dataset> >> >> Let me know if that makes sense or I just completely missed the point. >> >> Cheers, >> Junjun >> >> >> From: Damian Smedley <[email protected]> >> Date: Thu, 26 May 2011 09:14:55 -0400 >> To: "[email protected]" <[email protected]> >> Subject: [BioMart Users] setting up multiple links between the same >> datasets >> >> Hi, >> >> Just helping set up some new expression database BioMarts. We want to have >> one config where the datasets are linked by gene for the use case "Give me >> all results from all the datasets for gene X". But we also want to have >> another config linked by anatomical term to satisfy the query "Give me all >> results from all the datasets for anatomical term X" >> >> But linking seems to be set up at the dataset rather than config level? Is >> there a way round this? >> >> Thanks >> Damian >> >> >
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
