Hi Thomas,

I figured out what goes wrong on the R side for the following query:

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
 virtualSchemaName = 'default' uniqueRows = '1' count = '0'
datasetConfigVersion = '0.6' header='1' requestid= 'biomaRt'> <Dataset name
= 'hsapiens_gene_ensembl'><Attribute name = 'ensembl_gene_id'/><Attribute
name = 'hsapiens_paralog_ensembl_gene'/><Attribute name =
'hsapiens_paralog_perc_id'/><Attribute name =
'hsapiens_paralog_perc_id_r1'/><Filter name = 'ensembl_gene_id' value =
'ENSG00000001561' /></Dataset></Query>

When a result comes back from the BioMart server, I map the header names
(e.g. "Ensembl Gene ID") back to the attribute name that was used in the
query (ensembl_gene_id), and I return a matrix with the attribute names
instead of the attribute descriptions back to the user.

However in case of the hsapiens_paralog_perc_id attributes, they get as
header "% Identity with respect to query gene" in the results from the
BioMart server.  As there are many attribute names with this same
description, I can not map these back to the original attribute name and
the R query crashes.

Is there a way to make my XML query as such that I get the attribute name
back instead of the attribute description in the header so I don't have to
map things back?

Cheers,
Steffen



On Wed, May 22, 2013 at 2:55 PM, Steffen Durinck <[email protected]> wrote:

> Hi Thomas, Benjamin,
>
> The problem is on the R side, the "%" symbol in the attributes names (%
> Identity with respect to query gene) are causing trouble, I am working on
> a fix.  Until then you can add bmHeader=FALSE to your getBM query and
> things should work (see below):
>
> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> > attributes =
> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_perc_id_r1")
> > attributes=c(attributes,"mmusculus_homolog_orthology_type",
> "mmusculus_homolog_subtype", "mmusculus_homolog_perc_id")
> >  orth.mouse = getBM( attributes,filters="with_homolog_mmus",values
> =TRUE, mart = human, bmHeader=FALSE)
> > dim(orth.mouse)
> [1] 22886     6
>
> Best,
> Steffen
>
>
>
> On Wed, May 22, 2013 at 7:05 AM, Thomas Maurel <[email protected]> wrote:
>
>> Dear Benjamin,
>>
>> I can't see what's wrong with your query, but it looks like the issue is
>> coming from the following attributes:
>>  "mmusculus_homolog_orthology_type"
>> "mmusculus_homolog_subtype"
>> "mmusculus_homolog_perc_id"
>>
>> If you do the following query you will get the same error back:
>>
>> > library(biomaRt)
>> > human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>> > attributes =
>> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_orthology_type")
>> > orth.mouse = getBM(attributes,
>>  filters="with_homolog_mmus",values=TRUE, mart = human, uniqueRows=TRUE)this
>> might be a bug coming from the biomaRt package. I would advise you to email
>> the bioconductor list: [email protected]
>> Error in `[.data.frame`(result, , attributes) :
>>   undefined columns selected
>>
>> It's when you start adding one of the previous attribute that the query
>> fail.
>> I have also try your query with the host pointing to ensembl.org and I
>> am getting the same error.
>> Since the query is working fine on the biomart interface, this might be a
>> bug coming from the biomaRt package. I would advise you to email the
>> bioconductor list: [email protected]
>>
>> Hope this helps,
>> Regards,
>> Thomas
>> On 22 May 2013, at 09:47, Benjamin Dubreuil wrote:
>>
>> Hi folks,
>>
>> I'm having a problem with getBM function.
>> I would like to retrieve the orthologs genes between Human and Mouse,
>> with their percentage of identity to one another, their orthology
>> relationship and their common ancestor.
>>
>> I did this, and it works fine :
>>
>> >library(biomaRt)
>> >human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>> >attributes =
>> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_perc_id_r1")
>> > orth.mouse = getBM(attributes,
>> filters="with_homolog_mmus",values=TRUE, mart = human, uniqueRows=TRUE)
>> > dim(orth.mouse)
>> [1] 22886     3
>> >
>>
>> But when I'm adding some attributes, I get an error :
>>
>> >attributes=c(attributes,"mmusculus_homolog_orthology_type",
>> "mmusculus_homolog_subtype", "mmusculus_homolog_perc_id")
>> > orth.mouse = getBM( attributes,filters="with_homolog_mmus",values
>> =TRUE, mart = human)
>> Error in `[.data.frame`(result, , attributes) :
>>   undefined columns selected
>>
>>
>> I've checked the attributes names, I didnt make any typos:
>> > listAttributes(human)[c(1,567,573:576),]
>> *                  name
>> description*
>> 1                    ensembl_gene_id
>>                  Ensembl Gene ID
>> 567                mmusculus_homolog_ensembl_gene           Mouse Ensembl
>> Gene ID
>> 573                mmusculus_homolog_orthology_type          Homology Type
>> 574                mmusculus_homolog_subtype                     Ancestor
>> 575                mmusculus_homolog_perc_id                      %
>> Identity with respect to query gene
>> 576                mmusculus_homolog_perc_id_r1                 %
>> Identity with respect to Mouse gene
>>
>> Can anyone see what I'm doing wrong ?
>>
>>
>> Best.
>>
>> Dubreuil Benjamin
>> E. Levy Group (The Cell architecture Lab)
>> Weitzmann Insitute of Science, ISRAEL
>> Kimmelman Building, 4th floor, room 410
>> _______________________________________________
>> Users mailing list
>> [email protected]
>> https://lists.biomart.org/mailman/listinfo/users
>>
>>
>> --
>> Thomas Maurel
>> Bioinformatician - Ensembl Production Team
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge - CB10 1SD - UK
>>
>>
>> _______________________________________________
>> Users mailing list
>> [email protected]
>> https://lists.biomart.org/mailman/listinfo/users
>>
>
>
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to