Hi Isaac, I see the reason why it still gives your intersection. The query XML shows you retrieve your local dataset (CDKdb_proto) through ensembl gene dataset (Dataset name="hsapiens_gene_ensembl"). And for the same reason there are repeated column in SQL, one is for join purpose.
To verify this you can create another access point in martconfigurator just for you own dataset. Let me know if you have any further questions. Cheers, Junjun Sent from my BBerry From: Isaac cano [mailto:[email protected]] Sent: Thursday, July 21, 2011 04:44 AM To: Junjun Zhang Cc: BioMart Users <[email protected]> Subject: Re: [BioMart Users] Importing from sources Dear Junjun, I've looked into the request log file and the query is the following: 2011-07-21 10:02:18,842 INFO [4219289@qtp-914691-6:Log.java:164]: Incoming XML query: <!DOCTYPE Query><Query client="biomartclient" processor="TSVX" limit="1000" header="1"><Dataset name="hsapiens_gene_ensembl" config="hsapiens_gene_ensembl_config"><Attribute name="CDKdb_proto__DiffExpression__main__gene_affy_id_101"/></Dataset></Query> Source: SELECT ckd.CDKdb_proto__DiffExpression__main.gene_affy_id, ckd.CDKdb_proto__DiffExpression__main.gene_affy_id FROM ckd.CDKdb_proto__DiffExpression__main 2011-07-21 10:02:19,260 INFO [pool-1-thread-1:Log.java:164]: using linkindices 2011-07-21 10:02:19,261 INFO [pool-1-thread-1:Log.java:164]: looking for index file in fileSystem (only first time trip): hsapiens_gene_ensembl_hsapiens_gene_ensembl__CDKdb_proto__DiffExpression__main_ckd.txt 2011-07-21 10:02:19,261 INFO [pool-1-thread-1:Log.java:164]: GETWD: /root/Desktop/biomartrc6/dist 2011-07-21 10:02:19,261 INFO [pool-1-thread-1:Log.java:164]: reading indices from fileSystem: hsapiens_gene_ensembl_hsapiens_gene_ensembl__CDKdb_proto__DiffExpression__main_ckd.txt 2011-07-21 10:02:19,262 ERROR [pool-1-thread-1:Log.java:208]: index file not found under: /registry/linkindices/ Index NOT used! 2011-07-21 10:02:32,062 INFO [4219289@qtp-914691-6:Log.java:164]: Total query time is 13080 ms The first think that I find strange is that the SELECT contains twice the ckd.CDKdb_proto__DiffExpression__main.gene_affy_id attribute but the output of the query only shows it once. As you can see there is no ensembl attribute involved in the query but I don't get all the affy ids stored in the local database (There are 816 affy ids and I get 574). Do you think that the index file not found issue could be causing this? Thanks again! Isaac 2011/7/21 Junjun Zhang <[email protected]<mailto:[email protected]>> Dear Isaac, Your use case is well taken here. We have users are trying to do similar things. It's a matter of inner join v.s. left join. The current behaviour in BioMart is that the joins are inner join, ie, only intersection will be returned in the result. It is currently not possible to alter this behaviour. We are thinking to introduce a flag in the configuration to let deployer control the join behaviour. You mentioned that even you did not include any ensembl attribute in the query, you still get only the intersection. This is strange, is there any ensembl filter used in the query? As you would imagine, if a query only involves attribute/filter from one dataset, there shouldn't be any join at all. To further diagnose the problem, you may want to look into the log and find out how query is executed. Let me know how you find. Cheers, Junjun From: Isaac cano <[email protected]<mailto:[email protected]>> Date: Wed, 20 Jul 2011 11:33:04 -0400 To: BioMart Users <[email protected]<mailto:[email protected]>> Subject: [BioMart Users] Importing from sources Dear BioMart users, I'm dealing with BioMart to annotate a local database by importing several attributes from the Ensembl mart (homo sapiens). Now I've created a configuration for the Ensembl source and imported the local attributes to it. The general point is that we would like to retrieve all the data stored in the local database plus other attributes from different sources like Ensembl in case they exist, in case not we would like to still retrieve the local data and getting "no data" in those Ensembl attributes that do not exist for the selected local attributes. Is this possible? The issue is that when I query only the local attributes (in this case the "affy id" attribute) the results only contain those "affy ids" that are also present on Ensembl. I would agree with this result if I would have included in the query attributes from the Ensembl mart but this is not the case. Then the question is the following: Is BioMart by default making the intersection between the two sources when creating a link between them? Is there a way to get the union instead of the intersection? Thanks in advance, -- Isaac Cano Bioinformatics Linkcare Health Services SL C/Villarroel 170 08036 - Barcelona Tel.: (+34)932 275 400, ext. 4182\4523 Mobile: (+34) 666 186 748 Fax: (+34) 932 275 455 [email protected]<mailto:[email protected]> -- Isaac Cano Bioinformatics Linkcare Health Services SL C/Villarroel 170 08036 - Barcelona Tel.: (+34)932 275 400, ext. 4182\4523 Mobile: (+34) 666 186 748 Fax: (+34) 932 275 455 [email protected]<mailto:[email protected]>
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
