Hi Junjun, You are right, if I create another access point for my own database I can get all the stored affy_ids. What I was doing is having to datasets (my own local dataset and the ensembl mart) and creating one access point for ensembl and expanding it by importing the attributes from my own database. Then considering your comments I imagine that doing that BioMart is always doing an inner join whilst I would like to get left joins. I'm really going to wait till biomart developers release such a flag!
Thanks for your help! Isaac 2011/7/21 Junjun Zhang <[email protected]> > Hi Isaac, > > I see the reason why it still gives your intersection. The query XML shows > you retrieve your local dataset (CDKdb_proto) through ensembl gene dataset > (Dataset name="hsapiens_gene_ensembl"). And for the same reason there are > repeated column in SQL, one is for join purpose. > > To verify this you can create another access point in martconfigurator just > for you own dataset. > > Let me know if you have any further questions. > > Cheers, > Junjun > Sent from my BBerry > > *From*: Isaac cano [mailto:[email protected]] > *Sent*: Thursday, July 21, 2011 04:44 AM > *To*: Junjun Zhang > *Cc*: BioMart Users <[email protected]> > *Subject*: Re: [BioMart Users] Importing from sources > > Dear Junjun, > > I've looked into the request log file and the query is the following: > > 2011-07-21 10:02:18,842 INFO [4219289@qtp-914691-6:Log.java:164]: > Incoming XML query: <!DOCTYPE Query><Query client="biomartclient" > processor="TSVX" limit="1000" header="1"><Dataset > name="hsapiens_gene_ensembl" > config="hsapiens_gene_ensembl_config"><Attribute > name="CDKdb_proto__DiffExpression__main__gene_affy_id_101"/></Dataset></Query> > Source: SELECT *ckd.CDKdb_proto__DiffExpression__main.gene_affy_id*, * > ckd.CDKdb_proto__DiffExpression__main.gene_affy_id* FROM > ckd.CDKdb_proto__DiffExpression__main > 2011-07-21 10:02:19,260 INFO [pool-1-thread-1:Log.java:164]: using > linkindices > 2011-07-21 10:02:19,261 INFO [pool-1-thread-1:Log.java:164]: looking for > index file in fileSystem (only first time trip): > hsapiens_gene_ensembl_hsapiens_gene_ensembl__CDKdb_proto__DiffExpression__main_ckd.txt > 2011-07-21 10:02:19,261 INFO [pool-1-thread-1:Log.java:164]: GETWD: > /root/Desktop/biomartrc6/dist > 2011-07-21 10:02:19,261 INFO [pool-1-thread-1:Log.java:164]: reading > indices from fileSystem: > hsapiens_gene_ensembl_hsapiens_gene_ensembl__CDKdb_proto__DiffExpression__main_ckd.txt > 2011-07-21 10:02:19,262 ERROR [pool-1-thread-1:Log.java:208]:* index file > not found* under: /registry/linkindices/ > Index NOT used! > 2011-07-21 10:02:32,062 INFO [4219289@qtp-914691-6:Log.java:164]: Total > query time is 13080 ms > > The first think that I find strange is that the SELECT contains twice the > *ckd.CDKdb_proto__DiffExpression__main.gene_affy_id *attribute but the > output of the query only shows it once. As you can see there is no ensembl > attribute involved in the query but I don't get all the affy ids stored in > the local database (There are 816 affy ids and I get 574). Do you think that > the index file not found issue could be causing this? > > Thanks again! > > Isaac > > 2011/7/21 Junjun Zhang <[email protected]> > >> Dear Isaac, >> >> Your use case is well taken here. We have users are trying to do similar >> things. It's a matter of inner join v.s. left join. The current behaviour in >> BioMart is that the joins are inner join, ie, only intersection will be >> returned in the result. It is currently not possible to alter this >> behaviour. We are thinking to introduce a flag in the configuration to let >> deployer control the join behaviour. >> >> You mentioned that even you did not include any ensembl attribute in the >> query, you still get only the intersection. This is strange, is there any >> ensembl filter used in the query? As you would imagine, if a query only >> involves attribute/filter from one dataset, there shouldn't be any join at >> all. To further diagnose the problem, you may want to look into the log and >> find out how query is executed. >> >> Let me know how you find. >> >> Cheers, >> Junjun >> >> >> From: Isaac cano <[email protected]> >> Date: Wed, 20 Jul 2011 11:33:04 -0400 >> To: BioMart Users <[email protected]> >> Subject: [BioMart Users] Importing from sources >> >> Dear BioMart users, >> >> I'm dealing with BioMart to annotate a local database by importing several >> attributes from the Ensembl mart (homo sapiens). Now I've created a >> configuration for the Ensembl source and imported the local attributes to >> it. >> >> The general point is that we would like to retrieve all the data stored in >> the local database plus other attributes from different sources like Ensembl >> in case they exist, in case not we would like to still retrieve the local >> data and getting "no data" in those Ensembl attributes that do not exist for >> the selected local attributes. Is this possible? >> >> The issue is that when I query only the local attributes (in this case the >> "affy id" attribute) the results only contain those "affy ids" that are also >> present on Ensembl. I would agree with this result if I would have included >> in the query attributes from the Ensembl mart but this is not the case. Then >> the question is the following: Is BioMart by default making the intersection >> between the two sources when creating a link between them? Is there a way to >> get the union instead of the intersection? >> >> Thanks in advance, >> >> -- >> Isaac Cano >> Bioinformatics >> Linkcare Health Services SL >> C/Villarroel 170 >> 08036 - Barcelona >> Tel.: (+34)932 275 400, ext. 4182\4523 >> Mobile: (+34) 666 186 748 >> Fax: (+34) 932 275 455 >> [email protected] >> >> > > > -- > Isaac Cano > Bioinformatics > Linkcare Health Services SL > C/Villarroel 170 > 08036 - Barcelona > Tel.: (+34)932 275 400, ext. 4182\4523 > Mobile: (+34) 666 186 748 > Fax: (+34) 932 275 455 > [email protected] > > -- Isaac Cano Bioinformatics Linkcare Health Services SL C/Villarroel 170 08036 - Barcelona Tel.: (+34)932 275 400, ext. 4182\4523 Mobile: (+34) 666 186 748 Fax: (+34) 932 275 455 [email protected]
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
