Hi Steffen,
Yes, the error is thrown from the BioMart server side. But this is because the
original query XML the server received is not well-formed (as the error message
shows).
It might be possible the original query XML looks like this:
<Query>
<Dataset name = 'hsapiens_gene_ensembl'>
<Filter name = 'hgnc_symbol' value = '2'-PDE'/>
<Attribute name = 'ensembl_gene_id' />
<Attribute name = 'hgnc_symbol' />
</Dataset>
</Query>
If that's the case, the solution is actually simple, we just need to escape the
apostrophe using the string "'"
<Query>
<Dataset name = 'hsapiens_gene_ensembl'>
<Filter name = 'hgnc_symbol' value = '2'-PDE'/>
<Attribute name = 'ensembl_gene_id' />
<Attribute name = 'hgnc_symbol' />
</Dataset>
</Query>
Do you think it's possible to escape special characters (eg, '<', '>', '&' etc)
in query XML in biomaRt before it sends to the BioMart server.
Thanks,
Junjun
From: Steffen Durinck <[email protected]<mailto:[email protected]>>
Date: Fri, 15 Jul 2011 15:46:00 -0400
To: Marie Wong-Erasmus
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] bug when gene ID contains an apostrophe
The error is thrown on the BioMart side, I don't think it likes quotes in gene
symbols.
In addition, if you want a mapping between all gene symbols and ensembl gene
ids you could do this in one query in R:
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
map = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"),
filters="with_hgnc", values=TRUE, mart=mart.ens)
head(map)
ensembl_gene_id hgnc_symbol
1 ENSG00000249567 MIMT1
2 ENSG00000246493 SNHG8
3 ENSG00000187667 WHAMML1
4 ENSG00000248334 WHAMML2
5 ENSG00000225273 UBE2Q2P2
6 ENSG00000186615 C14orf33
Steffen
On Fri, Jul 15, 2011 at 10:45 AM, Marie Wong-Erasmus
<[email protected]<mailto:[email protected]>> wrote:
hi Tim,
You might want to post this to the bioconductor mailing list to get them to
handle single quotes in the value field.
Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which
is what should be returned if you used 2'-PDE
Marie
From: Timothée Flutre <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: [BioMart Users] bug when gene ID contains an apostrophe
Hello,
I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene
IDs:
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol",
values="IPO8", mart=mart.ens)
ensembl_gene_id hgnc_symbol
1 ENSG00000133704 IPO8
It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE"
is an alias for the gene "PDE12", see
here<http://www.genenames.org/data/hgnc_data.php?hgnc_id=25386>):
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol",
values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters =
"hgnc_symbol", :
Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at
/usr/lib/perl5/XML/Parser.pm line 187
I know how to work around this for my own case, but would it be possible to fix
this for a future release?
Best regards,
Tim
_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users