Hi Isaac,

I see the reason why it still gives your intersection. The query XML shows you 
retrieve your local dataset (CDKdb_proto) through ensembl gene dataset (Dataset 
name="hsapiens_gene_ensembl"). And for the same reason there are repeated 
column in SQL, one is for join purpose.

To verify this you can create another access point in martconfigurator just for 
you own dataset.

Let me know if you have any further questions.

Cheers,
Junjun
Sent from my BBerry

From: Isaac cano [mailto:[email protected]]
Sent: Thursday, July 21, 2011 04:44 AM
To: Junjun Zhang
Cc: BioMart Users <[email protected]>
Subject: Re: [BioMart Users] Importing from sources

Dear Junjun,

I've looked into the request log file and the query is the following:

2011-07-21 10:02:18,842 INFO  [4219289@qtp-914691-6:Log.java:164]: Incoming XML 
query: <!DOCTYPE Query><Query client="biomartclient" processor="TSVX" 
limit="1000" header="1"><Dataset name="hsapiens_gene_ensembl" 
config="hsapiens_gene_ensembl_config"><Attribute 
name="CDKdb_proto__DiffExpression__main__gene_affy_id_101"/></Dataset></Query>
Source: SELECT ckd.CDKdb_proto__DiffExpression__main.gene_affy_id, 
ckd.CDKdb_proto__DiffExpression__main.gene_affy_id FROM 
ckd.CDKdb_proto__DiffExpression__main
2011-07-21 10:02:19,260 INFO  [pool-1-thread-1:Log.java:164]: using linkindices
2011-07-21 10:02:19,261 INFO  [pool-1-thread-1:Log.java:164]: looking for index 
file in fileSystem (only first time trip): 
hsapiens_gene_ensembl_hsapiens_gene_ensembl__CDKdb_proto__DiffExpression__main_ckd.txt
2011-07-21 10:02:19,261 INFO  [pool-1-thread-1:Log.java:164]: GETWD: 
/root/Desktop/biomartrc6/dist
2011-07-21 10:02:19,261 INFO  [pool-1-thread-1:Log.java:164]: reading indices 
from fileSystem: 
hsapiens_gene_ensembl_hsapiens_gene_ensembl__CDKdb_proto__DiffExpression__main_ckd.txt
2011-07-21 10:02:19,262 ERROR [pool-1-thread-1:Log.java:208]: index file not 
found under: /registry/linkindices/
Index NOT used!
2011-07-21 10:02:32,062 INFO  [4219289@qtp-914691-6:Log.java:164]: Total query 
time is 13080 ms

The first think that I find strange is that the SELECT contains twice the 
ckd.CDKdb_proto__DiffExpression__main.gene_affy_id attribute but the output of 
the query only shows it once. As you can see there is no ensembl attribute 
involved in the query but I don't get all the affy ids stored in the local 
database (There are 816 affy ids and I get 574). Do you think that the index 
file not found issue could be causing this?

Thanks again!

Isaac

2011/7/21 Junjun Zhang <[email protected]<mailto:[email protected]>>
Dear Isaac,

Your use case is well taken here. We have users are trying to do similar 
things. It's a matter of inner join v.s. left join. The current behaviour in 
BioMart is that the joins are inner join, ie, only intersection will be 
returned in the result. It is currently not possible to alter this behaviour. 
We are thinking to introduce a flag in the configuration to let deployer 
control the join behaviour.

You mentioned that even you did not include any ensembl attribute in the query, 
you still get only the intersection. This is strange, is there any ensembl 
filter used in the query? As you would imagine, if a query only involves 
attribute/filter from one dataset, there shouldn't be any join at all. To 
further diagnose the problem, you may want to look into the log and find out 
how query is executed.

Let me know how you find.

Cheers,
Junjun


From: Isaac cano <[email protected]<mailto:[email protected]>>
Date: Wed, 20 Jul 2011 11:33:04 -0400
To: BioMart Users <[email protected]<mailto:[email protected]>>
Subject: [BioMart Users] Importing from sources

Dear BioMart users,

I'm dealing with BioMart to annotate a local database by importing several 
attributes from the Ensembl mart (homo sapiens). Now I've created a 
configuration for the Ensembl source and imported the local attributes to it.

The general point is that we would like to retrieve all the data stored in the 
local database plus other attributes from different sources like Ensembl in 
case they exist, in case not we would like to still retrieve the local data and 
getting "no data" in those Ensembl attributes that do not exist for the 
selected local attributes. Is this possible?

The issue is that when I query only the local attributes (in this case the 
"affy id" attribute) the results only contain those "affy ids" that are also 
present on Ensembl. I would agree with this result if I would have included in 
the query attributes from the Ensembl mart but this is not the case. Then the 
question is the following: Is BioMart by default making the intersection 
between the two sources when creating a link between them? Is there a way to 
get the union instead of the intersection?

Thanks in advance,

--
Isaac Cano
Bioinformatics
Linkcare Health Services SL
C/Villarroel 170
08036 - Barcelona
Tel.: (+34)932 275 400, ext. 4182\4523
Mobile: (+34) 666 186 748
Fax: (+34) 932 275 455
[email protected]<mailto:[email protected]>




--
Isaac Cano
Bioinformatics
Linkcare Health Services SL
C/Villarroel 170
08036 - Barcelona
Tel.: (+34)932 275 400, ext. 4182\4523
Mobile: (+34) 666 186 748
Fax: (+34) 932 275 455
[email protected]<mailto:[email protected]>

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to