Hi Yuan and Arek,
This is the feedback from our helpdesk team:
The Ensembl databases backing the genome browser web site and BioMart
are always in sync. BioMart is rebuilt with every Ensembl release.
The underlying problem here is that human ZNF226 and ZNF239 have been
accidentally merged on the basis of transcript ZNF226-206
(ENST00000536276). We require overlap of coding regions in the genome
to merge transcripts into gene clusters. Since transcript ZNF226-206
(ENST00000536276) overlaps the coding region of ZNF226-204
(ENST00000426739), which really is a representative of the ZNF234 gene
we have merged both genes under ZNF226.
http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000167380;r=19:44645710-44681836
By selecting transcript ZNF226-204 (ENST00000426739) and following the
"General Identifiers" link in the navigation column this transcript
clearly maps to external UniProtKB/Swiss-Prot and NCBI RefSeq record
for ZNF234.
http://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000167380;r=19:44645710-44681836;t=ENST00000426739
However, even on the Ensembl web site the merged gene is associated
with both gene symbols. This can be seen by following the "External
identifiers" link in the navigation column of the Gene page, wher this
gene is associated with HGNC symbols ZNF226 and ZNF234.
http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000167380;r=19:44645710-44681836
Now, we realise that this accidental gene merge is far from ideal and
really confusing to our users. Transcript ZNF226-204
(ENST00000426739), which causes the merge has been annotated on the
basis of UniProtKB/TrEMBL record O14859, which is only a very short
fragment of a zinc finger protein (68 amino acid residues).
http://www.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000167380;r=19:44645710-44681836;t=ENST00000536276
http://www.uniprot.org/uniprot/O14859
Since it is based on weak supporting evidence we will delete this
transcript for one of the upcoming releases and may also ask UniProtKB
to remove this rather weak record.
Regards
Rhoda
On 29 Nov 2011, at 00:07, Arek Kasprzyk wrote:
Hi Rhoda
do you have any more insight on what's going on there?
a
On Mon, Nov 28, 2011 at 1:28 PM, Li, Yongjin <[email protected]>
wrote:
duplicate gene?
________________________________________
From: [email protected] [[email protected]] on
behalf of Yuan Hao [[email protected]]
Sent: Monday, November 28, 2011 6:04 AM
To: [email protected]
Subject: [BioMart Users] ENSG00000167380 biomart record question
Dear Biomart team,
I used biomart in R/bioconductor to map ensembl gene ids to gene
symbols. I found sometimes a single gene id associates with multiple
gene symbols, for example ENSG00000167380, which associates with both
'ZNF234' and 'ZNF226'.
By searching 'ZNF234' in Ensembl browser, I got 'ZNF226' (chr19:
44,645,710 - 44,681,836) returned instead which seems that the later
took over both. Does this mean gene record in this region has been
updated, but not yet in R/Bioconductor? I looked into the UCSC browser
as well, where still two separate records for each of the two gene
symbols are kept: ZNF234 (chr19: 44,645,710 - 44,664,460), ZNF226
(chr19:44,669,249 - 44,681,836). It would be very much appreciated if
you could help to clarify on this. Thank you very much in advance!
Cheers,
Yuan
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users
Rhoda Kinsella Ph.D.
Ensembl Production Project Leader,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users