Hi Thomas, I figured out what goes wrong on the R side for the following query:
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query virtualSchemaName = 'default' uniqueRows = '1' count = '0' datasetConfigVersion = '0.6' header='1' requestid= 'biomaRt'> <Dataset name = 'hsapiens_gene_ensembl'><Attribute name = 'ensembl_gene_id'/><Attribute name = 'hsapiens_paralog_ensembl_gene'/><Attribute name = 'hsapiens_paralog_perc_id'/><Attribute name = 'hsapiens_paralog_perc_id_r1'/><Filter name = 'ensembl_gene_id' value = 'ENSG00000001561' /></Dataset></Query> When a result comes back from the BioMart server, I map the header names (e.g. "Ensembl Gene ID") back to the attribute name that was used in the query (ensembl_gene_id), and I return a matrix with the attribute names instead of the attribute descriptions back to the user. However in case of the hsapiens_paralog_perc_id attributes, they get as header "% Identity with respect to query gene" in the results from the BioMart server. As there are many attribute names with this same description, I can not map these back to the original attribute name and the R query crashes. Is there a way to make my XML query as such that I get the attribute name back instead of the attribute description in the header so I don't have to map things back? Cheers, Steffen On Wed, May 22, 2013 at 2:55 PM, Steffen Durinck <[email protected]> wrote: > Hi Thomas, Benjamin, > > The problem is on the R side, the "%" symbol in the attributes names (% > Identity with respect to query gene) are causing trouble, I am working on > a fix. Until then you can add bmHeader=FALSE to your getBM query and > things should work (see below): > > human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > > attributes = > c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_perc_id_r1") > > attributes=c(attributes,"mmusculus_homolog_orthology_type", > "mmusculus_homolog_subtype", "mmusculus_homolog_perc_id") > > orth.mouse = getBM( attributes,filters="with_homolog_mmus",values > =TRUE, mart = human, bmHeader=FALSE) > > dim(orth.mouse) > [1] 22886 6 > > Best, > Steffen > > > > On Wed, May 22, 2013 at 7:05 AM, Thomas Maurel <[email protected]> wrote: > >> Dear Benjamin, >> >> I can't see what's wrong with your query, but it looks like the issue is >> coming from the following attributes: >> "mmusculus_homolog_orthology_type" >> "mmusculus_homolog_subtype" >> "mmusculus_homolog_perc_id" >> >> If you do the following query you will get the same error back: >> >> > library(biomaRt) >> > human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> > attributes = >> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_orthology_type") >> > orth.mouse = getBM(attributes, >> filters="with_homolog_mmus",values=TRUE, mart = human, uniqueRows=TRUE)this >> might be a bug coming from the biomaRt package. I would advise you to email >> the bioconductor list: [email protected] >> Error in `[.data.frame`(result, , attributes) : >> undefined columns selected >> >> It's when you start adding one of the previous attribute that the query >> fail. >> I have also try your query with the host pointing to ensembl.org and I >> am getting the same error. >> Since the query is working fine on the biomart interface, this might be a >> bug coming from the biomaRt package. I would advise you to email the >> bioconductor list: [email protected] >> >> Hope this helps, >> Regards, >> Thomas >> On 22 May 2013, at 09:47, Benjamin Dubreuil wrote: >> >> Hi folks, >> >> I'm having a problem with getBM function. >> I would like to retrieve the orthologs genes between Human and Mouse, >> with their percentage of identity to one another, their orthology >> relationship and their common ancestor. >> >> I did this, and it works fine : >> >> >library(biomaRt) >> >human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> >attributes = >> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_perc_id_r1") >> > orth.mouse = getBM(attributes, >> filters="with_homolog_mmus",values=TRUE, mart = human, uniqueRows=TRUE) >> > dim(orth.mouse) >> [1] 22886 3 >> > >> >> But when I'm adding some attributes, I get an error : >> >> >attributes=c(attributes,"mmusculus_homolog_orthology_type", >> "mmusculus_homolog_subtype", "mmusculus_homolog_perc_id") >> > orth.mouse = getBM( attributes,filters="with_homolog_mmus",values >> =TRUE, mart = human) >> Error in `[.data.frame`(result, , attributes) : >> undefined columns selected >> >> >> I've checked the attributes names, I didnt make any typos: >> > listAttributes(human)[c(1,567,573:576),] >> * name >> description* >> 1 ensembl_gene_id >> Ensembl Gene ID >> 567 mmusculus_homolog_ensembl_gene Mouse Ensembl >> Gene ID >> 573 mmusculus_homolog_orthology_type Homology Type >> 574 mmusculus_homolog_subtype Ancestor >> 575 mmusculus_homolog_perc_id % >> Identity with respect to query gene >> 576 mmusculus_homolog_perc_id_r1 % >> Identity with respect to Mouse gene >> >> Can anyone see what I'm doing wrong ? >> >> >> Best. >> >> Dubreuil Benjamin >> E. Levy Group (The Cell architecture Lab) >> Weitzmann Insitute of Science, ISRAEL >> Kimmelman Building, 4th floor, room 410 >> _______________________________________________ >> Users mailing list >> [email protected] >> https://lists.biomart.org/mailman/listinfo/users >> >> >> -- >> Thomas Maurel >> Bioinformatician - Ensembl Production Team >> European Bioinformatics Institute (EMBL-EBI) >> Wellcome Trust Genome Campus, Hinxton >> Cambridge - CB10 1SD - UK >> >> >> _______________________________________________ >> Users mailing list >> [email protected] >> https://lists.biomart.org/mailman/listinfo/users >> > >
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
