Hi Arek,

Can you please elaborate more on the follow point mentioned in your previous 
response?

"There is one more situation involving inlist that has not
been solved either ei annotating external identifiers that effectively is
an inner join as well and has been brought to our attention before."

Thanks,
Junjun


From: Arek Kasprzyk <[email protected]<mailto:[email protected]>>
Date: Tue, 6 Sep 2011 13:29:01 -0400
To: jzhang <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Junjun:
my three cents ;)

1. I would definitely NOT expose end users to understand the querying logic for 
inter mart queries.
2. left join with inter mart querying needs to be solved with in list 
annotation at the same time
3. The 'on the fly' left join was tested two years ago when i asked for the 
query compiler to be extended so we know well how it performs :)

a.






On Tue, Sep 6, 2011 at 12:53 PM, Junjun Zhang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Nuria,

Your use case is well understood. As left-join is a more typical scenario for 
inter-mart queries, we are considering to make left-join the default behaviour 
in the next release. Technically this is not hard to implement, however, we'd 
like to be very careful about making this type of decision as it has global 
impact to the whole querying system. In this case, we will need to provide 
means to enable deployers (maybe end users as well) to alter the default join 
behaviour from the web GUI and the query XML.

For join with a dataset between main table and dimension table, we can continue 
to support precomputed 'left join' as Arek mentioned below. We can also make it 
an optional for performing 'left join' on the fly. The only concern is that 
under certain situations the latter may have some performance disadvantages.

We will keep you informed on this.

Hope this helps!

Best regards,
Junjun


From: Arek Kasprzyk <[email protected]<mailto:[email protected]>>
Date: Mon, 5 Sep 2011 09:55:42 -0400
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Hi Nuria,
this has been a well know problem (or a feature depends how you look at it :)) 
of BioMart from it's beginnings. BioMart 0.7 performs an inner join only 
between tables within single dataset. The join between marts depends on data 
overlap between them and can behave either as an outer join when there is 100% 
overlap or as inner join if there is no full overlap (more typical scenario). 
This is of course not an ideal situation because it looks somewhat 
non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the former 
is a simple SQL inner join, the latter is an 'inlist' that is an equivalent to 
multiple 'or' statements. The effect of this implementation was that all 
dimension tables in a dataset had to be a precomputed 'left join' tables ei 
tables with multiple NULLs which is not ideal for large tables with sparse 
records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to be 
more flexible and perform an inner or outer join within the dataset depending 
on the administrator's settings that should available through MartConfigurator 
and also automatically set based on the initial MBuilder components settings. I 
do remember personally instructing one of the developers to include this 
feature in the QueryCompiler and it was there as a simple switch at some point 
but i doubth it that it ever made it to MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7 
(ei there are no provisions for the outer join). However, your email alerted me 
to this fact and this is definitely something that we need to think about. 
There is one more situation involving inlist that has not been solved either ei 
annotating external identifiers that effectively is an inner join as well and 
has been brought to our attention before. Technically the solution for both 
should be the same. This is definitely doable and something worth scheduling in 
the near future.


Junjun: please check where we are with the DATASET outer join implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, 
<[email protected]<mailto:[email protected]>> wrote:
Dear BioMart users,

We are setting up a local database powered by BioMart inorder to annotate our 
local data with the data of other marts. The problem is that the type of join 
that BioMart is performing between tables seems to be an inner join since 
BioMart solely outputs all the records that match in the two marts linked. My 
question is, if this is true, are you planning to add outer joins in order to 
get in addition all the unmatched results? In our case, if we want to annotate 
our database with another mart data, we would perform a left join to get all 
the matched records + all the unmatched records from our database. And in case 
you are planning to implement this, when do you predict that this 
implementation will be accessible for us, the BioMart users?

This type of outputting the query results, i.e. to output both the matched and 
unmatched records from our database, is of highly interest in order to annotate 
BUT without losing information from the query because of the annotation 
process. Many thanks in advance.

Please, do not hesitate to ask me for more information.

Yours faithfully,

NĂºria



_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to