Cool, either way fits better your need is a good choice.
Cheers,
Junjun

From: Damian Smedley <[email protected]<mailto:[email protected]>>
Date: Thu, 2 Jun 2011 12:16:15 -0400
To: jzhang <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] setting up multiple links between the same datasets


Hi Junjun,

Thanks for the tips.

Looks like for the anatomical term query either a MartReport solution or just 
creating a simplified combined partitioned dataset with the common fields may 
be the way to go then.

Cheers
Damian

On Mon, May 30, 2011 at 4:36 AM, Junjun Zhang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Damian,

OK, I see. So you basically want to apply the same filter (eg, anatomic_term or 
gene) to three different expression datasets and retrieve results from all of 
them. It makes sense if this runs as three queries and returns three separate 
sets of results. If it were to run as one query where three datasets are 
joined, it would be possible to run, but the results would be very difficult to 
understand. For example, when applying one anatomic_term, dataset1 returns 10 
rows, dataset2 returns 15 rows, dataset3 returns 20 rows, when they are joined, 
it will produce 10X15X20=3000 rows in the resulting table.

Other than joining the three datasets, it might be better to utilize martreport 
to retrieve data from three datasets and present them on the same page as 
different sections. Let's say, we create a anatomy dataset, then link it to all 
three expression datasets with anatomic_term (or anatomy_id, or something). 
Next, we create a report for the anatomy dataset choosing anatomic_term as the 
primary identifier. Last, add three expression datasets to the report as three 
different sections. Here is an example of a report for a gene at the ICGC data 
portal: 
http://dcc.icgc.org/martreport/?report=report&mart=gene_report&ensembl_gene_id=ENSG00000141510

Hope this helps,

Junjun



From: Damian Smedley <[email protected]<mailto:[email protected]>>
Date: Sat, 28 May 2011 05:22:27 -0400
To: jzhang <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] setting up multiple links between the same datasets

Hi Junjun,

The problem here is that the three expression databases they host at MRC-HGU 
are from different projects and contain quite different data and have been 
rightly configured as separate datasets. Partitioning would not be desirable 
here. So the situation is 3 expression datasets. The results you would get from 
linking by gene and linking by anatomy will sometimes be very different but 
this is a valid use case.

Cheers
Damian


On Sat, May 28, 2011 at 3:26 AM, Junjun Zhang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Damian,

Sorry for the delay.

If I understand you correctly, the current system already supports it. Here is 
what I see it:

You have three data sources:
  expression (partitioned with multiple datasets, eg, expression1, expression2, 
expression3)
  gene (single dataset)
  anatomy (single dataset)

Data source expression is linked with gene dataset via gene_id, and it is 
linked with anatomy via anatomic_term_id.

Now, create a config (config1) of expression and add gene_symbol as a pointer 
filter pointing to gene_symbol filter in gene dataset. Similarly create another 
config (config2) for expression and add anatomical_term as a pointer filter 
pointing to anatomical_term filter in anatomy dataset.

Finally, the queries:

1. Give me all results from all the datasets for gene X:

<dataset name="expression1,expression2,expression3" config="config1">
  <filter name="gene_symbol" value="gene X"/>
  <attribute name="xxxxxxx"/>
  <!-- more expression attributes here -->
</dataset>

2. Give me all results from all the datasets for anatomical term X:

<dataset name="expression1,expression2,expression3" config="config2">
  <filter name="anatomical_term" value="term X"/>
  <attribute name="xxxxxxx"/>
  <!-- more expression attributes here -->
</dataset>

Both queries will return the union of results from three expression datasets: 
expression1, expression2, expression3. For the queries to work properly, it's 
not needed for the link to be config specific. When a pointer filter is picked 
up in the query, BioMart query engine will be able to pick up the correct link 
to perform the join.

These queries are similar to the following one which is a real query from the 
ICGC data portal. This query gives you the methylation results of two cancers 
for genes involved in 'Apoptosis' pathway.


<Dataset name="hsapiens_gene_ensembl_tcgaREAD,hsapiens_gene_ensembl_tcgaSTAD" 
config="gene_ensembl_config">

<Filter name="_displayname" value="Apoptosis"/> <!-- this is a pointer filter 
from pathway dataset, pathway and gene datasets are linked via ensembl_gene_id 
-->

<Attribute name="cancertype"/>

<Attribute name="ensembl_gene_id"/>

<Attribute name="hsapiens_gene_ensembl__methylation__dm__tumour_sample_id"/>

<Attribute 
name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_1"/>

<Attribute 
name="hsapiens_gene_ensembl__methylation__dm__percent_methylation_2"/>

</Dataset>

Let me know if that makes sense or I just completely missed the point.

Cheers,
Junjun


From: Damian Smedley <[email protected]<mailto:[email protected]>>
Date: Thu, 26 May 2011 09:14:55 -0400
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [BioMart Users] setting up multiple links between the same datasets

Hi,

Just helping set up some new expression database BioMarts. We want to have one 
config where the datasets are linked by gene for the use case "Give me all 
results from all the datasets for gene X".  But we also want to have another 
config linked by anatomical term to satisfy the query "Give me all results from 
all the datasets for anatomical term X"

But linking seems to be set up at the dataset rather than config level? Is 
there a way round this?

Thanks
Damian



_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to