Hi Junjun,

Thanks for the tips.

Looks like for the anatomical term query either a MartReport solution or
just creating a simplified combined partitioned dataset with the common
fields may be the way to go then.

Cheers
Damian

On Mon, May 30, 2011 at 4:36 AM, Junjun Zhang <[email protected]>wrote:

> Hi Damian,
>
> OK, I see. So you basically want to apply the same filter (eg,
> anatomic_term or gene) to three different expression datasets and retrieve
> results from all of them. It makes sense if this runs as three queries and
> returns three separate sets of results. If it were to run as one query where
> three datasets are joined, it would be possible to run, but the results
> would be very difficult to understand. For example, when applying one
> anatomic_term, dataset1 returns 10 rows, dataset2 returns 15 rows, dataset3
> returns 20 rows, when they are joined, it will produce 10X15X20=3000 rows in
> the resulting table.
>
> Other than joining the three datasets, it might be better to utilize
> martreport to retrieve data from three datasets and present them on the same
> page as different sections. Let's say, we create a anatomy dataset, then
> link it to all three expression datasets with anatomic_term (or anatomy_id,
> or something). Next, we create a report for the anatomy dataset choosing
> anatomic_term as the primary identifier. Last, add three expression datasets
> to the report as three different sections. Here is an example of a report
> for a gene at the ICGC data portal:
> http://dcc.icgc.org/martreport/?report=report&mart=gene_report&ensembl_gene_id=ENSG00000141510
>
> Hope this helps,
>
> Junjun
>
>
>
> From: Damian Smedley <[email protected]>
> Date: Sat, 28 May 2011 05:22:27 -0400
> To: jzhang <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Subject: Re: [BioMart Users] setting up multiple links between the same
> datasets
>
> Hi Junjun,
>
> The problem here is that the three expression databases they host at
> MRC-HGU are from different projects and contain quite different data and
> have been rightly configured as separate datasets. Partitioning would not be
> desirable here. So the situation is 3 expression datasets. The results you
> would get from linking by gene and linking by anatomy will sometimes be very
> different but this is a valid use case.
>
> Cheers
> Damian
>
>
> On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang <[email protected]>wrote:
>
>> Hi Damian,
>>
>> Sorry for the delay.
>>
>> If I understand you correctly, the current system already supports it.
>> Here is what I see it:
>>
>> You have three data sources:
>>   expression (partitioned with multiple datasets, eg, expression1,
>> expression2, expression3)
>>   gene (single dataset)
>>   anatomy (single dataset)
>>
>> Data source expression is linked with gene dataset via gene_id, and it is
>> linked with anatomy via anatomic_term_id.
>>
>> Now, create a config (config1) of expression and add gene_symbol as a
>> pointer filter pointing to gene_symbol filter in gene dataset. Similarly
>> create another config (config2) for expression and add anatomical_term as a
>> pointer filter pointing to anatomical_term filter in anatomy dataset.
>>
>> Finally, the queries:
>>
>> 1. Give me all results from all the datasets for gene X:
>>
>> <dataset name="expression1,expression2,expression3" config="config1">
>>   <filter name="gene_symbol" value="gene X"/>
>>   <attribute name="xxxxxxx"/>
>>   <!-- more expression attributes here -->
>> </dataset>
>>
>> 2. Give me all results from all the datasets for anatomical term X:
>>
>> <dataset name="expression1,expression2,expression3" config="config2">
>>   <filter name="anatomical_term" value="term X"/>
>>   <attribute name="xxxxxxx"/>
>>   <!-- more expression attributes here -->
>> </dataset>
>>
>> Both queries will return the union of results from three expression
>> datasets: expression1, expression2, expression3. For the queries to work
>> properly, it's not needed for the link to be config specific. When a pointer
>> filter is picked up in the query, BioMart query engine will be able to pick
>> up the correct link to perform the join.
>>
>> These queries are similar to the following one which is a real query from
>> the ICGC data portal. This query gives you the methylation results of two
>> cancers for genes involved in 'Apoptosis' pathway.
>>
>> <Dataset
>> name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD"
>> config="gene_ensembl_config">
>>
>> <Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer
>> filter from pathway dataset, pathway and gene datasets are linked via
>> ensembl_gene_id -->
>>
>> <Attribute name="cancertype"/>
>>
>> <Attribute name="ensembl_gene_id"/>
>>
>> <Attribute
>> name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/>
>>
>> <Attribute
>> name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/>
>>
>> <Attribute
>> name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/>
>>
>> </Dataset>
>>
>> Let me know if that makes sense or I just completely missed the point.
>>
>> Cheers,
>> Junjun
>>
>>
>> From: Damian Smedley <[email protected]>
>> Date: Thu, 26 May 2011 09:14:55 -0400
>> To: "[email protected]" <[email protected]>
>> Subject: [BioMart Users] setting up multiple links between the same
>> datasets
>>
>> Hi,
>>
>> Just helping set up some new expression database BioMarts. We want to have
>> one config where the datasets are linked by gene for the use case "Give me
>> all results from all the datasets for gene X".  But we also want to have
>> another config linked by anatomical term to satisfy the query "Give me all
>> results from all the datasets for anatomical term X"
>>
>> But linking seems to be set up at the dataset rather than config level? Is
>> there a way round this?
>>
>> Thanks
>> Damian
>>
>>
>
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to