Hi Syed, This order is not be always preserved though when you do a query across two datasets. We had issues with that in the past, I'm not sure if this still happens.
Cheers, Steffen On Thu, May 23, 2013 at 2:17 AM, Syed Haider <[email protected]> wrote: > Hi Stephen, the order of header attributes should be the same as the order > of attributes in the query you send. So, in theory, you can match these > back! > > Syed > > > On 22 May 2013 23:53, Steffen Durinck <[email protected]> wrote: > >> Hi Thomas, >> >> I figured out what goes wrong on the R side for the following query: >> >> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query >> virtualSchemaName = 'default' uniqueRows = '1' count = '0' >> datasetConfigVersion = '0.6' header='1' requestid= 'biomaRt'> <Dataset name >> = 'hsapiens_gene_ensembl'><Attribute name = 'ensembl_gene_id'/><Attribute >> name = 'hsapiens_paralog_ensembl_gene'/><Attribute name = >> 'hsapiens_paralog_perc_id'/><Attribute name = >> 'hsapiens_paralog_perc_id_r1'/><Filter name = 'ensembl_gene_id' value = >> 'ENSG00000001561' /></Dataset></Query> >> >> When a result comes back from the BioMart server, I map the header names >> (e.g. "Ensembl Gene ID") back to the attribute name that was used in the >> query (ensembl_gene_id), and I return a matrix with the attribute names >> instead of the attribute descriptions back to the user. >> >> However in case of the hsapiens_paralog_perc_id attributes, they get as >> header "% Identity with respect to query gene" in the results from the >> BioMart server. As there are many attribute names with this same >> description, I can not map these back to the original attribute name and >> the R query crashes. >> >> Is there a way to make my XML query as such that I get the attribute name >> back instead of the attribute description in the header so I don't have to >> map things back? >> >> Cheers, >> Steffen >> >> >> >> On Wed, May 22, 2013 at 2:55 PM, Steffen Durinck <[email protected]>wrote: >> >>> Hi Thomas, Benjamin, >>> >>> The problem is on the R side, the "%" symbol in the attributes names (% >>> Identity with respect to query gene) are causing trouble, I am working >>> on a fix. Until then you can add bmHeader=FALSE to your getBM query and >>> things should work (see below): >>> >>> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >>> > attributes = >>> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_perc_id_r1") >>> > attributes=c(attributes,"mmusculus_homolog_orthology_type", >>> "mmusculus_homolog_subtype", "mmusculus_homolog_perc_id") >>> > orth.mouse = getBM( attributes,filters="with_homolog_mmus",values >>> =TRUE, mart = human, bmHeader=FALSE) >>> > dim(orth.mouse) >>> [1] 22886 6 >>> >>> Best, >>> Steffen >>> >>> >>> >>> On Wed, May 22, 2013 at 7:05 AM, Thomas Maurel <[email protected]> wrote: >>> >>>> Dear Benjamin, >>>> >>>> I can't see what's wrong with your query, but it looks like the issue >>>> is coming from the following attributes: >>>> "mmusculus_homolog_orthology_type" >>>> "mmusculus_homolog_subtype" >>>> "mmusculus_homolog_perc_id" >>>> >>>> If you do the following query you will get the same error back: >>>> >>>> > library(biomaRt) >>>> > human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >>>> > attributes = >>>> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_orthology_type") >>>> > orth.mouse = getBM(attributes, >>>> filters="with_homolog_mmus",values=TRUE, mart = human, >>>> uniqueRows=TRUE)this >>>> might be a bug coming from the biomaRt package. I would advise you to email >>>> the bioconductor list: [email protected] >>>> Error in `[.data.frame`(result, , attributes) : >>>> undefined columns selected >>>> >>>> It's when you start adding one of the previous attribute that the query >>>> fail. >>>> I have also try your query with the host pointing to ensembl.org and I >>>> am getting the same error. >>>> Since the query is working fine on the biomart interface, this might be >>>> a bug coming from the biomaRt package. I would advise you to email the >>>> bioconductor list: [email protected] >>>> >>>> Hope this helps, >>>> Regards, >>>> Thomas >>>> On 22 May 2013, at 09:47, Benjamin Dubreuil wrote: >>>> >>>> Hi folks, >>>> >>>> I'm having a problem with getBM function. >>>> I would like to retrieve the orthologs genes between Human and Mouse, >>>> with their percentage of identity to one another, their orthology >>>> relationship and their common ancestor. >>>> >>>> I did this, and it works fine : >>>> >>>> >library(biomaRt) >>>> >human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >>>> >attributes = >>>> c("ensembl_gene_id","mmusculus_homolog_ensembl_gene","mmusculus_homolog_perc_id_r1") >>>> > orth.mouse = getBM(attributes, >>>> filters="with_homolog_mmus",values=TRUE, mart = human, uniqueRows=TRUE) >>>> > dim(orth.mouse) >>>> [1] 22886 3 >>>> > >>>> >>>> But when I'm adding some attributes, I get an error : >>>> >>>> >attributes=c(attributes,"mmusculus_homolog_orthology_type", >>>> "mmusculus_homolog_subtype", "mmusculus_homolog_perc_id") >>>> > orth.mouse = getBM( attributes,filters="with_homolog_mmus",values >>>> =TRUE, mart = human) >>>> Error in `[.data.frame`(result, , attributes) : >>>> undefined columns selected >>>> >>>> >>>> I've checked the attributes names, I didnt make any typos: >>>> > listAttributes(human)[c(1,567,573:576),] >>>> * name >>>> description* >>>> 1 ensembl_gene_id >>>> Ensembl Gene ID >>>> 567 mmusculus_homolog_ensembl_gene Mouse >>>> Ensembl Gene ID >>>> 573 mmusculus_homolog_orthology_type Homology >>>> Type >>>> 574 mmusculus_homolog_subtype >>>> Ancestor >>>> 575 mmusculus_homolog_perc_id % >>>> Identity with respect to query gene >>>> 576 mmusculus_homolog_perc_id_r1 % >>>> Identity with respect to Mouse gene >>>> >>>> Can anyone see what I'm doing wrong ? >>>> >>>> >>>> Best. >>>> >>>> Dubreuil Benjamin >>>> E. Levy Group (The Cell architecture Lab) >>>> Weitzmann Insitute of Science, ISRAEL >>>> Kimmelman Building, 4th floor, room 410 >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> https://lists.biomart.org/mailman/listinfo/users >>>> >>>> >>>> -- >>>> Thomas Maurel >>>> Bioinformatician - Ensembl Production Team >>>> European Bioinformatics Institute (EMBL-EBI) >>>> Wellcome Trust Genome Campus, Hinxton >>>> Cambridge - CB10 1SD - UK >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> https://lists.biomart.org/mailman/listinfo/users >>>> >>> >>> >> >> _______________________________________________ >> Users mailing list >> [email protected] >> https://lists.biomart.org/mailman/listinfo/users >> > >
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
