Re: [Bioc-devel] Question about org.Dr.eg.db package

2020-08-13 Thread James W. MacDonald
Glad to help!

On Thu, Aug 13, 2020 at 5:51 PM Margolin, Gennady (NIH/NICHD) [C] <
gennady.margo...@nih.gov> wrote:

> Hi Jim,
>
>
>
> Hi Jim,
>
>
>
> Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has
> only functional annotation, which I thought it was as it did not refer to a
> specific genome, unlike TxDb packages, but then I found what I said in my
> previous emails.
>
>
>
> Thank you very much,
>
> Gennady
>
>
>
> *From: *"James W. MacDonald" 
> *Reply-To: *"jmac...@u.washington.edu" 
> *Date: *Thursday, August 13, 2020 at 5:41 PM
> *To: *"Margolin, Gennady (NIH/NICHD) [C]" 
> *Cc: *Vincent Carey , "
> bioc-devel@r-project.org" 
> *Subject: *Re: [Bioc-devel] Question about org.Dr.eg.db package
>
>
>
> Hi Gennady,
>
>
>
> That information should probably be cleaned up, and the BiMaps that point
> to the location data removed. While the OrgDbs do contain position
> information, it's been deprecated, which you would find if you tried to
> query using select():
>
>
>
> > select(org.Dr.eg.db, "30037", "CHR")
> 'select()' returned 1:1 mapping between keys and columns
>   ENTREZID CHR
> 130037   5
> Warning message:
> In .deprecatedColsMessage() :
>   Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
>   deprecated. Please use a range based accessor like genes(), or select()
>   with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb
>   object instead.
>
>
>
> The rationale being that the OrgDb packages are intended to contain
> functional annotations, which are not based on any build, and instead are
> current as of the construction of the OrgDb package. Since positional
> information should be based on a genome release, those data have been
> migrated to the TxDb and EnsDb packages, which are based on a given release.
>
>
>
> Put a different way, the data in an OrgDb package is downloaded from NCBI
> as of a particular date, and the positional data we get are whatever we got
> from NCBI on that date. This is obviously a problem for the positional
> data, because what we get isn't necessarily build-specific. We get the TxDb
> data from the UCSC Genome Browser, which is build specific, so we can tell
> end users exactly what build the data come from. Ideally these data would
> be defunct in the OrgDb packages, but it hasn't happened yet.
>
>
>
> Best,
>
>
>
> Jim
>
>
>
>
>
>
>
> On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via
> Bioc-devel  wrote:
>
> Hi Vincent,
>
> Thank you for responding.
>
> Here is from the R documentation help page from this package (I have
> version 3.10.0 (I doubt anything changed with the latest one, which is
> 3.11.4)):
>
> -
> org.Dr.egCHRLOC {org.Dr.eg.db}
> Entrez Gene IDs to Chromosomal Location
> Description
> org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the
> starting position of the gene. The position of a gene is measured as the
> number of base pairs.
> The CHRLOCEND mapping is the same as the CHRLOC mapping except that it
> specifies the ending base of a gene instead of the start.
> ……
> -
>
> This output also does not show any genome version:
> > org.Dr.eg_dbInfo()
>  name
>value
> 1 DBSCHEMAVERSION
>  2.1
> 2 Db type
>OrgDb
> 3  Supporting package
>  AnnotationDbi
> 4DBSCHEMA
> ZEBRAFISH_DB
> 5ORGANISM
>  Danio rerio
> 6 SPECIES
>Zebrafish
> 7EGSOURCEDATE
>   2019-Jul10
> 8EGSOURCENAME
>  Entrez Gene
> 9 EGSOURCEURL
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> 10  CENTRALID
>   EG
> 11  TAXID
> 7955
> 12   GOSOURCENAME
>  Gene Ontology
> 13GOSOURCEURL
> ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
> 14   GOSOURCEDATE
>   2019-Jul10
> 15 GOEGSOURCEDATE
>   2019-Jul10
> 16 GOEGSOURCENAME
>  Entrez Gene
> 17  GOEGSOURCEURL
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> 18 KEGGSOURCENAME
>  KEGG GENOME
> 19  KEGGSOURCEURL
> ftp://ftp.genome.jp/pub/kegg/genomes
> 20 KEGGSOURCEDATE
>   2011-Mar15
> 21   GPSOURCENAME  UCSC Genome Bioinformatics
> (Danio rerio)
> 22GPSOURCEURL
> 23       GPSOURCEDATE
>2017-Nov1
> 24   ENSOURCEDATE
>   2019-Jun24
> 25   ENSOURCENAME
>  Ensembl
&g

Re: [Bioc-devel] Question about org.Dr.eg.db package

2020-08-13 Thread Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel
Hi Jim,

Hi Jim,

Awesome, that makes sense now. I was wondering whether org.Dr.eg.db has only 
functional annotation, which I thought it was as it did not refer to a specific 
genome, unlike TxDb packages, but then I found what I said in my previous 
emails.

Thank you very much,
Gennady

From: "James W. MacDonald" 
Reply-To: "jmac...@u.washington.edu" 
Date: Thursday, August 13, 2020 at 5:41 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" 
Cc: Vincent Carey , "bioc-devel@r-project.org" 

Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package

Hi Gennady,

That information should probably be cleaned up, and the BiMaps that point to 
the location data removed. While the OrgDbs do contain position information, 
it's been deprecated, which you would find if you tried to query using select():

> select(org.Dr.eg.db, "30037", "CHR")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID CHR
130037   5
Warning message:
In .deprecatedColsMessage() :
  Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
  deprecated. Please use a range based accessor like genes(), or select()
  with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb
  object instead.

The rationale being that the OrgDb packages are intended to contain functional 
annotations, which are not based on any build, and instead are current as of 
the construction of the OrgDb package. Since positional information should be 
based on a genome release, those data have been migrated to the TxDb and EnsDb 
packages, which are based on a given release.

Put a different way, the data in an OrgDb package is downloaded from NCBI as of 
a particular date, and the positional data we get are whatever we got from NCBI 
on that date. This is obviously a problem for the positional data, because what 
we get isn't necessarily build-specific. We get the TxDb data from the UCSC 
Genome Browser, which is build specific, so we can tell end users exactly what 
build the data come from. Ideally these data would be defunct in the OrgDb 
packages, but it hasn't happened yet.

Best,

Jim



On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via 
Bioc-devel mailto:bioc-devel@r-project.org>> wrote:
Hi Vincent,

Thank you for responding.

Here is from the R documentation help page from this package (I have version 
3.10.0 (I doubt anything changed with the latest one, which is 3.11.4)):

-
org.Dr.egCHRLOC {org.Dr.eg.db}
Entrez Gene IDs to Chromosomal Location
Description
org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the 
starting position of the gene. The position of a gene is measured as the number 
of base pairs.
The CHRLOCEND mapping is the same as the CHRLOC mapping except that it 
specifies the ending base of a gene instead of the start.
……
-

This output also does not show any genome version:
> org.Dr.eg_dbInfo()
 name   
  value
1 DBSCHEMAVERSION   
2.1
2 Db type   
  OrgDb
3  Supporting package 
AnnotationDbi
4DBSCHEMA  
ZEBRAFISH_DB
5ORGANISM   
Danio rerio
6 SPECIES 
Zebrafish
7EGSOURCEDATE
2019-Jul10
8EGSOURCENAME   
Entrez Gene
9 EGSOURCEURL  
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10  CENTRALID   
 EG
11  TAXID   
   7955
12   GOSOURCENAME Gene 
Ontology
13GOSOURCEURL 
ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
14   GOSOURCEDATE
2019-Jul10
15 GOEGSOURCEDATE
2019-Jul10
16 GOEGSOURCENAME   
Entrez Gene
17  GOEGSOURCEURL  
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
18 KEGGSOURCENAME   
KEGG GENOME
19  KEGGSOURCEURL  
ftp://ftp.genome.jp/pub/kegg/genomes
20 KEGGSOURCEDATE
2011-Mar15
21   GPSOURCENAME  UCSC Genome B

Re: [Bioc-devel] Question about org.Dr.eg.db package

2020-08-13 Thread James W. MacDonald
Hi Gennady,

That information should probably be cleaned up, and the BiMaps that point
to the location data removed. While the OrgDbs do contain position
information, it's been deprecated, which you would find if you tried to
query using select():

> select(org.Dr.eg.db, "30037", "CHR")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID CHR
130037   5
Warning message:
In .deprecatedColsMessage() :
  Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
  deprecated. Please use a range based accessor like genes(), or select()
  with columns values like TXCHROM and TXSTART on a TxDb or OrganismDb
  object instead.

The rationale being that the OrgDb packages are intended to contain
functional annotations, which are not based on any build, and instead are
current as of the construction of the OrgDb package. Since positional
information should be based on a genome release, those data have been
migrated to the TxDb and EnsDb packages, which are based on a given release.

Put a different way, the data in an OrgDb package is downloaded from NCBI
as of a particular date, and the positional data we get are whatever we got
from NCBI on that date. This is obviously a problem for the positional
data, because what we get isn't necessarily build-specific. We get the TxDb
data from the UCSC Genome Browser, which is build specific, so we can tell
end users exactly what build the data come from. Ideally these data would
be defunct in the OrgDb packages, but it hasn't happened yet.

Best,

Jim



On Thu, Aug 13, 2020 at 4:39 PM Margolin, Gennady (NIH/NICHD) [C] via
Bioc-devel  wrote:

> Hi Vincent,
>
> Thank you for responding.
>
> Here is from the R documentation help page from this package (I have
> version 3.10.0 (I doubt anything changed with the latest one, which is
> 3.11.4)):
>
> -
> org.Dr.egCHRLOC {org.Dr.eg.db}
> Entrez Gene IDs to Chromosomal Location
> Description
> org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the
> starting position of the gene. The position of a gene is measured as the
> number of base pairs.
> The CHRLOCEND mapping is the same as the CHRLOC mapping except that it
> specifies the ending base of a gene instead of the start.
> ……
> -
>
> This output also does not show any genome version:
> > org.Dr.eg_dbInfo()
>  name
>value
> 1 DBSCHEMAVERSION
>  2.1
> 2 Db type
>OrgDb
> 3  Supporting package
>  AnnotationDbi
> 4DBSCHEMA
> ZEBRAFISH_DB
> 5ORGANISM
>  Danio rerio
> 6 SPECIES
>Zebrafish
> 7EGSOURCEDATE
>   2019-Jul10
> 8EGSOURCENAME
>  Entrez Gene
> 9 EGSOURCEURL
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> 10  CENTRALID
>   EG
> 11  TAXID
> 7955
> 12   GOSOURCENAME
>  Gene Ontology
> 13GOSOURCEURL
> ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
> 14   GOSOURCEDATE
>   2019-Jul10
> 15 GOEGSOURCEDATE
>   2019-Jul10
> 16 GOEGSOURCENAME
>  Entrez Gene
> 17  GOEGSOURCEURL
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> 18 KEGGSOURCENAME
>  KEGG GENOME
> 19  KEGGSOURCEURL
> ftp://ftp.genome.jp/pub/kegg/genomes
> 20 KEGGSOURCEDATE
>   2011-Mar15
> 21   GPSOURCENAME  UCSC Genome Bioinformatics
> (Danio rerio)
> 22GPSOURCEURL
> 23   GPSOURCEDATE
>2017-Nov1
> 24   ENSOURCEDATE
>   2019-Jun24
> 25   ENSOURCENAME
>  Ensembl
> 26ENSOURCEURL
> ftp://ftp.ensembl.org/pub/current_fasta
> 27   UPSOURCENAME
>  Uniprot
> 28UPSOURCEURL
> http://www.UniProt.org/
> 29   UPSOURCEDATE      Mon Oct 21
> 14:32:30 2019
>
> From: Vincent Carey 
> Date: Thursday, August 13, 2020 at 2:46 PM
> To: "Margolin, Gennady (NIH/NICHD) [C]" 
> Cc: "bioc-devel@r-project.org" 
> Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package
>
> This should probably be posed to the support site.  What version of the
> package are you using?  Where
> are you seeing coordinates?  I would expect those to be obtained from the
> TxDb package, or perhaps
> from AnnotationHub.
>
>
> > columns(org.Dr.eg.db)
>
>  [1] "ACCNUM"   "ALIAS""ENSEMBL"  "ENSEMBLPROT"
> "ENSEMBLTRANS"
>
>  [6] "ENTREZID" "ENZYME"   "EVIDENCE" "EVIDENCEALL"  "GENENAME"
>
> [11] "GO"   "GOALL&q

Re: [Bioc-devel] Question about org.Dr.eg.db package

2020-08-13 Thread Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel
Hi Vincent,

Thank you for responding.

Here is from the R documentation help page from this package (I have version 
3.10.0 (I doubt anything changed with the latest one, which is 3.11.4)):

-
org.Dr.egCHRLOC {org.Dr.eg.db}
Entrez Gene IDs to Chromosomal Location
Description
org.Dr.egCHRLOC is an R object that maps entrez gene identifiers to the 
starting position of the gene. The position of a gene is measured as the number 
of base pairs.
The CHRLOCEND mapping is the same as the CHRLOC mapping except that it 
specifies the ending base of a gene instead of the start.
……
-

This output also does not show any genome version:
> org.Dr.eg_dbInfo()
 name   
  value
1 DBSCHEMAVERSION   
2.1
2 Db type   
  OrgDb
3  Supporting package 
AnnotationDbi
4DBSCHEMA  
ZEBRAFISH_DB
5ORGANISM   
Danio rerio
6 SPECIES 
Zebrafish
7EGSOURCEDATE
2019-Jul10
8EGSOURCENAME   
Entrez Gene
9 EGSOURCEURL  
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
10  CENTRALID   
 EG
11  TAXID   
   7955
12   GOSOURCENAME Gene 
Ontology
13GOSOURCEURL 
ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
14   GOSOURCEDATE
2019-Jul10
15 GOEGSOURCEDATE
2019-Jul10
16 GOEGSOURCENAME   
Entrez Gene
17  GOEGSOURCEURL  
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
18 KEGGSOURCENAME   
KEGG GENOME
19  KEGGSOURCEURL  
ftp://ftp.genome.jp/pub/kegg/genomes
20 KEGGSOURCEDATE
2011-Mar15
21   GPSOURCENAME  UCSC Genome Bioinformatics 
(Danio rerio)
22GPSOURCEURL
23   GPSOURCEDATE 
2017-Nov1
24   ENSOURCEDATE
2019-Jun24
25   ENSOURCENAME   
Ensembl
26ENSOURCEURL   
ftp://ftp.ensembl.org/pub/current_fasta
27   UPSOURCENAME   
Uniprot
28UPSOURCEURL   
http://www.UniProt.org/
29   UPSOURCEDATE  Mon Oct 21 
14:32:30 2019

From: Vincent Carey 
Date: Thursday, August 13, 2020 at 2:46 PM
To: "Margolin, Gennady (NIH/NICHD) [C]" 
Cc: "bioc-devel@r-project.org" 
Subject: Re: [Bioc-devel] Question about org.Dr.eg.db package

This should probably be posed to the support site.  What version of the package 
are you using?  Where
are you seeing coordinates?  I would expect those to be obtained from the TxDb 
package, or perhaps
from AnnotationHub.


> columns(org.Dr.eg.db)

 [1] "ACCNUM"   "ALIAS""ENSEMBL"  "ENSEMBLPROT"  "ENSEMBLTRANS"

 [6] "ENTREZID" "ENZYME"   "EVIDENCE" "EVIDENCEALL"  "GENENAME"

[11] "GO"   "GOALL""IPI"  "ONTOLOGY" "ONTOLOGYALL"

[16] "PATH" "PFAM" "PMID" "PROSITE"  "REFSEQ"

[21] "SYMBOL"   "UNIGENE"  "UNIPROT"  "ZFIN"


On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via 
Bioc-devel mailto:bioc-devel@r-project.org>> wrote:
Hello,

I have a short question – how do I figure the genome version for org.Dr.eg.db 
package? I couldn’t see it in the DESCRIPTION and also it’s not in 
org.Dr.eg_dbInfo() output. It would be nice to know if this is danRer11/GRCz11 
or some other assembly, as there are coordinates present in the DB.

Thank you,
Gennady

[[alternative HTML version deleted]]

___
Bio

Re: [Bioc-devel] Question about org.Dr.eg.db package

2020-08-13 Thread Vincent Carey
This should probably be posed to the support site.  What version of the
package are you using?  Where
are you seeing coordinates?  I would expect those to be obtained from the
TxDb package, or perhaps
from AnnotationHub.

> columns(org.Dr.eg.db)

 [1] "ACCNUM"   "ALIAS""ENSEMBL"  "ENSEMBLPROT"
"ENSEMBLTRANS"

 [6] "ENTREZID" "ENZYME"   "EVIDENCE" "EVIDENCEALL"  "GENENAME"


[11] "GO"   "GOALL""IPI"  "ONTOLOGY"
"ONTOLOGYALL"

[16] "PATH" "PFAM" "PMID" "PROSITE"  "REFSEQ"


[21] "SYMBOL"   "UNIGENE"  "UNIPROT"  "ZFIN"


On Thu, Aug 13, 2020 at 2:13 PM Margolin, Gennady (NIH/NICHD) [C] via
Bioc-devel  wrote:

> Hello,
>
> I have a short question – how do I figure the genome version for
> org.Dr.eg.db package? I couldn’t see it in the DESCRIPTION and also it’s
> not in org.Dr.eg_dbInfo() output. It would be nice to know if this is
> danRer11/GRCz11 or some other assembly, as there are coordinates present in
> the DB.
>
> Thank you,
> Gennady
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Question about org.Dr.eg.db package

2020-08-13 Thread Margolin, Gennady (NIH/NICHD) [C] via Bioc-devel
Hello,

I have a short question – how do I figure the genome version for org.Dr.eg.db 
package? I couldn’t see it in the DESCRIPTION and also it’s not in 
org.Dr.eg_dbInfo() output. It would be nice to know if this is danRer11/GRCz11 
or some other assembly, as there are coordinates present in the DB.

Thank you,
Gennady

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel