Re: [Bioc-devel] Question about org.Dr.eg.db package
Glad to help! On Thu, Aug 13, 2020 at 5:51 PM Margolin, Gennady (NIH/NICHD) [C] < gennady.margo...@nih.gov> wrote: > Hi Jim, > > > > Hi Jim, > > > > Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has > only functional annotation, which I thought it was as it did not refer to a > specific genome, unlike TxDb packages, but then I found what I said in my > previous emails. > > > > Thank you very much, > > Gennady > > > > *From: *"James W. MacDonald" > *Reply-To: *"jmac...@u.washington.edu" > *Date: *Thursday, August 13, 2020 at 5:41 PM > *To: *"Margolin, Gennady (NIH/NICHD) [C]" > *Cc: *Vincent Carey , " > bioc-devel@r-project.org" > *Subject: *Re: [Bioc-devel] Question about org.Dr.eg.db package > > > > Hi Gennady, > > > > That information should probably be cleaned up, and the BiMaps that point > to the location data removed. While the OrgDbs do contain position > information, it's been deprecated, which you would find if you tried to > query using select(): > > > > > select(org.Dr.eg.db, "30037", "CHR") > 'select()' returned 1:1 mapping between keys and columns > ENTREZID CHR > 130037 5 > Warning message: > In .deprecatedColsMessage() : > Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is > deprecated. Please use a range based accessor like genes(), or select() > with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb > object instead. > > > > The rationale being that the OrgDb packages are intended to contain > functional annotations, which are not based on any build, and instead are > current as of the construction of the OrgDb package. Since positional > information should be based on a genome release, those data have been > migrated to the TxDb and EnsDb packages, which are based on a given release. > > > > Put a different way, the data in an OrgDb package is downloaded from NCBI > as of a particular date, and the positional data we get are whatever we got > from NCBI on that date. This is obviously a problem for the positional > data, because what we get isn't necessarily build-specific. We get the TxDb > data from the UCSC Genome Browser, which is build specific, so we can tell > end users exactly what build the data come from. Ideally these data would > be defunct in the OrgDb packages, but it hasn't happened yet. > > > > Best, > > > > Jim > > > > > > > > On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via > Bioc-devel wrote: > > Hi Vincent, > > Thank you for responding. > > Here is from the R documentation help page from this package (I have > version 3.10.0 (I doubt anything changed with the latest one, which is > 3.11.4)): > > - > org.Dr.egCHRLOC {org.Dr.eg.db} > Entrez Gene IDs to Chromosomal Location > Description > org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the > starting position of the gene. The position of a gene is measured as the > number of base pairs. > The CHRLOCEND mapping is the same as the CHRLOC mapping except that it > specifies the ending base of a gene instead of the start. > …… > - > > This output also does not show any genome version: > > org.Dr.eg_dbInfo() > name >value > 1 DBSCHEMAVERSION > 2.1 > 2 Db type >OrgDb > 3 Supporting package > AnnotationDbi > 4DBSCHEMA > ZEBRAFISH_DB > 5ORGANISM > Danio rerio > 6 SPECIES >Zebrafish > 7EGSOURCEDATE > 2019-Jul10 > 8EGSOURCENAME > Entrez Gene > 9 EGSOURCEURL > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > 10 CENTRALID > EG > 11 TAXID > 7955 > 12 GOSOURCENAME > Gene Ontology > 13GOSOURCEURL > ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/ > 14 GOSOURCEDATE > 2019-Jul10 > 15 GOEGSOURCEDATE > 2019-Jul10 > 16 GOEGSOURCENAME > Entrez Gene > 17 GOEGSOURCEURL > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > 18 KEGGSOURCENAME > KEGG GENOME > 19 KEGGSOURCEURL > ftp://ftp.genome.jp/pub/kegg/genomes > 20 KEGGSOURCEDATE > 2011-Mar15 > 21 GPSOURCENAME UCSC Genome Bioinformatics > (Danio rerio) > 22GPSOURCEURL > 23 GPSOURCEDATE >2017-Nov1 > 24 ENSOURCEDATE > 2019-Jun24 > 25 ENSOURCENAME > Ensembl &g
Re: [Bioc-devel] Question about org.Dr.eg.db package
Hi Jim, Hi Jim, Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has only functional annotation, which I thought it was as it did not refer to a specific genome, unlike TxDb packages, but then I found what I said in my previous emails. Thank you very much, Gennady From: "James W. MacDonald" Reply-To: "jmac...@u.washington.edu" Date: Thursday, August 13, 2020 at 5:41 PM To: "Margolin, Gennady (NIH/NICHD) [C]" Cc: Vincent Carey , "bioc-devel@r-project.org" Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package Hi Gennady, That information should probably be cleaned up, and the BiMaps that point to the location data removed. While the OrgDbs do contain position information, it's been deprecated, which you would find if you tried to query using select(): > select(org.Dr.eg.db, "30037", "CHR") 'select()' returned 1:1 mapping between keys and columns ENTREZID CHR 130037 5 Warning message: In .deprecatedColsMessage() : Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is deprecated. Please use a range based accessor like genes(), or select() with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb object instead. The rationale being that the OrgDb packages are intended to contain functional annotations, which are not based on any build, and instead are current as of the construction of the OrgDb package. Since positional information should be based on a genome release, those data have been migrated to the TxDb and EnsDb packages, which are based on a given release. Put a different way, the data in an OrgDb package is downloaded from NCBI as of a particular date, and the positional data we get are whatever we got from NCBI on that date. This is obviously a problem for the positional data, because what we get isn't necessarily build-specific. We get the TxDb data from the UCSC Genome Browser, which is build specific, so we can tell end users exactly what build the data come from. Ideally these data would be defunct in the OrgDb packages, but it hasn't happened yet. Best, Jim On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel mailto:bioc-devel@r-project.org>> wrote: Hi Vincent, Thank you for responding. Here is from the R documentation help page from this package (I have version 3.10.0 (I doubt anything changed with the latest one, which is 3.11.4)): - org.Dr.egCHRLOC {org.Dr.eg.db} Entrez Gene IDs to Chromosomal Location Description org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the starting position of the gene. The position of a gene is measured as the number of base pairs. The CHRLOCEND mapping is the same as the CHRLOC mapping except that it specifies the ending base of a gene instead of the start. …… - This output also does not show any genome version: > org.Dr.eg_dbInfo() name value 1 DBSCHEMAVERSION 2.1 2 Db type OrgDb 3 Supporting package AnnotationDbi 4DBSCHEMA ZEBRAFISH_DB 5ORGANISM Danio rerio 6 SPECIES Zebrafish 7EGSOURCEDATE 2019-Jul10 8EGSOURCENAME Entrez Gene 9 EGSOURCEURL ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 10 CENTRALID EG 11 TAXID 7955 12 GOSOURCENAME Gene Ontology 13GOSOURCEURL ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/ 14 GOSOURCEDATE 2019-Jul10 15 GOEGSOURCEDATE 2019-Jul10 16 GOEGSOURCENAME Entrez Gene 17 GOEGSOURCEURL ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 18 KEGGSOURCENAME KEGG GENOME 19 KEGGSOURCEURL ftp://ftp.genome.jp/pub/kegg/genomes 20 KEGGSOURCEDATE 2011-Mar15 21 GPSOURCENAME UCSC Genome B
Re: [Bioc-devel] Question about org.Dr.eg.db package
Hi Gennady, That information should probably be cleaned up, and the BiMaps that point to the location data removed. While the OrgDbs do contain position information, it's been deprecated, which you would find if you tried to query using select(): > select(org.Dr.eg.db, "30037", "CHR") 'select()' returned 1:1 mapping between keys and columns ENTREZID CHR 130037 5 Warning message: In .deprecatedColsMessage() : Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is deprecated. Please use a range based accessor like genes(), or select() with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb object instead. The rationale being that the OrgDb packages are intended to contain functional annotations, which are not based on any build, and instead are current as of the construction of the OrgDb package. Since positional information should be based on a genome release, those data have been migrated to the TxDb and EnsDb packages, which are based on a given release. Put a different way, the data in an OrgDb package is downloaded from NCBI as of a particular date, and the positional data we get are whatever we got from NCBI on that date. This is obviously a problem for the positional data, because what we get isn't necessarily build-specific. We get the TxDb data from the UCSC Genome Browser, which is build specific, so we can tell end users exactly what build the data come from. Ideally these data would be defunct in the OrgDb packages, but it hasn't happened yet. Best, Jim On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel wrote: > Hi Vincent, > > Thank you for responding. > > Here is from the R documentation help page from this package (I have > version 3.10.0 (I doubt anything changed with the latest one, which is > 3.11.4)): > > - > org.Dr.egCHRLOC {org.Dr.eg.db} > Entrez Gene IDs to Chromosomal Location > Description > org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the > starting position of the gene. The position of a gene is measured as the > number of base pairs. > The CHRLOCEND mapping is the same as the CHRLOC mapping except that it > specifies the ending base of a gene instead of the start. > …… > - > > This output also does not show any genome version: > > org.Dr.eg_dbInfo() > name >value > 1 DBSCHEMAVERSION > 2.1 > 2 Db type >OrgDb > 3 Supporting package > AnnotationDbi > 4DBSCHEMA > ZEBRAFISH_DB > 5ORGANISM > Danio rerio > 6 SPECIES >Zebrafish > 7EGSOURCEDATE > 2019-Jul10 > 8EGSOURCENAME > Entrez Gene > 9 EGSOURCEURL > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > 10 CENTRALID > EG > 11 TAXID > 7955 > 12 GOSOURCENAME > Gene Ontology > 13GOSOURCEURL > ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/ > 14 GOSOURCEDATE > 2019-Jul10 > 15 GOEGSOURCEDATE > 2019-Jul10 > 16 GOEGSOURCENAME > Entrez Gene > 17 GOEGSOURCEURL > ftp://ftp.ncbi.nlm.nih.gov/gene/DATA > 18 KEGGSOURCENAME > KEGG GENOME > 19 KEGGSOURCEURL > ftp://ftp.genome.jp/pub/kegg/genomes > 20 KEGGSOURCEDATE > 2011-Mar15 > 21 GPSOURCENAME UCSC Genome Bioinformatics > (Danio rerio) > 22GPSOURCEURL > 23 GPSOURCEDATE >2017-Nov1 > 24 ENSOURCEDATE > 2019-Jun24 > 25 ENSOURCENAME > Ensembl > 26ENSOURCEURL > ftp://ftp.ensembl.org/pub/current_fasta > 27 UPSOURCENAME > Uniprot > 28UPSOURCEURL > http://www.UniProt.org/ > 29 UPSOURCEDATE Mon Oct 21 > 14:32:30 2019 > > From: Vincent Carey > Date: Thursday, August 13, 2020 at 2:46 PM > To: "Margolin, Gennady (NIH/NICHD) [C]" > Cc: "bioc-devel@r-project.org" > Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package > > This should probably be posed to the support site. What version of the > package are you using? Where > are you seeing coordinates? I would expect those to be obtained from the > TxDb package, or perhaps > from AnnotationHub. > > > > columns(org.Dr.eg.db) > > [1] "ACCNUM" "ALIAS""ENSEMBL" "ENSEMBLPROT" > "ENSEMBLTRANS" > > [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" > > [11] "GO" "GOALL&q
Re: [Bioc-devel] Question about org.Dr.eg.db package
Hi Vincent, Thank you for responding. Here is from the R documentation help page from this package (I have version 3.10.0 (I doubt anything changed with the latest one, which is 3.11.4)): - org.Dr.egCHRLOC {org.Dr.eg.db} Entrez Gene IDs to Chromosomal Location Description org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the starting position of the gene. The position of a gene is measured as the number of base pairs. The CHRLOCEND mapping is the same as the CHRLOC mapping except that it specifies the ending base of a gene instead of the start. …… - This output also does not show any genome version: > org.Dr.eg_dbInfo() name value 1 DBSCHEMAVERSION 2.1 2 Db type OrgDb 3 Supporting package AnnotationDbi 4DBSCHEMA ZEBRAFISH_DB 5ORGANISM Danio rerio 6 SPECIES Zebrafish 7EGSOURCEDATE 2019-Jul10 8EGSOURCENAME Entrez Gene 9 EGSOURCEURL ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 10 CENTRALID EG 11 TAXID 7955 12 GOSOURCENAME Gene Ontology 13GOSOURCEURL ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/ 14 GOSOURCEDATE 2019-Jul10 15 GOEGSOURCEDATE 2019-Jul10 16 GOEGSOURCENAME Entrez Gene 17 GOEGSOURCEURL ftp://ftp.ncbi.nlm.nih.gov/gene/DATA 18 KEGGSOURCENAME KEGG GENOME 19 KEGGSOURCEURL ftp://ftp.genome.jp/pub/kegg/genomes 20 KEGGSOURCEDATE 2011-Mar15 21 GPSOURCENAME UCSC Genome Bioinformatics (Danio rerio) 22GPSOURCEURL 23 GPSOURCEDATE 2017-Nov1 24 ENSOURCEDATE 2019-Jun24 25 ENSOURCENAME Ensembl 26ENSOURCEURL ftp://ftp.ensembl.org/pub/current_fasta 27 UPSOURCENAME Uniprot 28UPSOURCEURL http://www.UniProt.org/ 29 UPSOURCEDATE Mon Oct 21 14:32:30 2019 From: Vincent Carey Date: Thursday, August 13, 2020 at 2:46 PM To: "Margolin, Gennady (NIH/NICHD) [C]" Cc: "bioc-devel@r-project.org" Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package This should probably be posed to the support site. What version of the package are you using? Where are you seeing coordinates? I would expect those to be obtained from the TxDb package, or perhaps from AnnotationHub. > columns(org.Dr.eg.db) [1] "ACCNUM" "ALIAS""ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" [11] "GO" "GOALL""IPI" "ONTOLOGY" "ONTOLOGYALL" [16] "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ" [21] "SYMBOL" "UNIGENE" "UNIPROT" "ZFIN" On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel mailto:bioc-devel@r-project.org>> wrote: Hello, I have a short question – how do I figure the genome version for org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s not in org.Dr.eg_dbInfo() output. It would be nice to know if this is danRer11/GRCz11 or some other assembly, as there are coordinates present in the DB. Thank you, Gennady [[alternative HTML version deleted]] ___ Bio
Re: [Bioc-devel] Question about org.Dr.eg.db package
This should probably be posed to the support site. What version of the package are you using? Where are you seeing coordinates? I would expect those to be obtained from the TxDb package, or perhaps from AnnotationHub. > columns(org.Dr.eg.db) [1] "ACCNUM" "ALIAS""ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME" [11] "GO" "GOALL""IPI" "ONTOLOGY" "ONTOLOGYALL" [16] "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ" [21] "SYMBOL" "UNIGENE" "UNIPROT" "ZFIN" On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel wrote: > Hello, > > I have a short question – how do I figure the genome version for > org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s > not in org.Dr.eg_dbInfo() output. It would be nice to know if this is > danRer11/GRCz11 or some other assembly, as there are coordinates present in > the DB. > > Thank you, > Gennady > > [[alternative HTML version deleted]] > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- The information in this e-mail is intended only for the ...{{dropped:18}} ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Question about org.Dr.eg.db package
Hello, I have a short question – how do I figure the genome version for org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s not in org.Dr.eg_dbInfo() output. It would be nice to know if this is danRer11/GRCz11 or some other assembly, as there are coordinates present in the DB. Thank you, Gennady [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel