Hi Nuria,
this has been a well know problem (or a feature depends how you look at it
:)) of BioMart from it's beginnings. BioMart 0.7 performs an inner join only
between tables within single dataset. The join between marts depends on data
overlap between them and can behave either as an outer join when there is
100% overlap or as inner join if there is no full overlap (more typical
scenario). This is of course not an ideal situation because it looks
somewhat non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the
former is a simple SQL inner join, the latter is an 'inlist' that is an
equivalent to multiple 'or' statements. The effect of this implementation
was that all dimension tables in a dataset had to be a precomputed 'left
join' tables ei tables with multiple NULLs which is not ideal for large
tables with sparse records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to
be more flexible and perform an inner or outer join within the dataset
depending on the administrator's settings that should available through
MartConfigurator and also automatically set based on the initial MBuilder
components settings. I do remember personally instructing one of the
developers to include this feature in the QueryCompiler and it was there as
a simple switch at some point but i doubth it that it ever made it to
MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7
(ei there are no provisions for the outer join). However, your email alerted
me to this fact and this is definitely something that we need to think
about. There is one more situation involving inlist that has not been solved
either ei annotating external identifiers that effectively is an inner join
as well and has been brought to our attention before. Technically the
solution for both should be the same. This is definitely doable and
something worth scheduling in the near future.


Junjun: please check where we are with the DATASET outer join
implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, <[email protected]> wrote:

>  Dear BioMart users,****
>
> ** **
>
> We are setting up a local database powered by BioMart in order to annotate
> our local data with the data of other marts. The problem is that the type of
> join that BioMart is performing between tables seems to be an inner join
> since BioMart solely outputs all the records that match in the two marts
> linked. My question is, if this is true, are you planning to add outer joins
> in order to get in addition all the unmatched results? In our case, if we
> want to annotate our database with another mart data, we would perform a
> left join to get all the matched records + all the unmatched records from
> our database. And in case you are planning to implement this, when do you
> predict that this implementation will be accessible for us, the BioMart
> users?****
>
> ** **
>
> This type of outputting the query results, i.e. to output both the matched
> and unmatched records from our database, is of highly interest in order to
> annotate BUT without losing information from the query because of the
> annotation process. Many thanks in advance.****
>
> ** **
>
> Please, do not hesitate to ask me for more information. ****
>
> ** **
>
> Yours faithfully,****
>
> ** **
>
> NĂºria****
>
> ** **
>
> ** **
>
> _______________________________________________
> Users mailing list
> [email protected]
> https://lists.biomart.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to