Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-26 Thread Aaron Lun
Thanks Daniel. Glad to see the end of that monkey business, my analyses
were going bananas.

On Fri, Apr 26, 2019 at 3:41 PM Van Twisk, Daniel <
daniel.vantw...@roswellpark.org> wrote:

> I've pushed new 3.8.2 orgdbs that should propagate soon. They do not have
> this issue.
> --
> *From:* Bioc-devel  on behalf of Pages,
> Herve 
> *Sent:* Thursday, April 25, 2019 9:19:35 PM
> *To:* Aaron Lun; Vincent Carey
> *Cc:* Bioc-devel; jmac...@u.washington.edu
> *Subject:* Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Hi Aaron,
>
> On 4/25/19 16:44, Aaron Lun wrote:
>
> It doesn't seem like it - on my installation, org.Hs.eg.db is still...
> monkeying around.
>
>
>   __
>  w  c(..)o   (
>   \__(-)__)
>   /\   (
>  /(_)___)
>  w /|
>   | \
>  m  m
>
> Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The
> new packages are on their way and should become available via
> BiocManager::install() in the next 12 hours or so.
>
> Hopefully they'll put an end to the Great Monkey Conspiracy!
>
> Unfortunately we won't see the effect on tomorrow's build report, only on
> Saturday's report.
>
> Cheers,
>
> H.
>
>
>
>
>
> On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey  ><mailto:st...@channing.harvard.edu>
> wrote:
>
>
>
> Has this situation been rectified?
>
> On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
> daniel.vantw...@roswellpark.org<mailto:daniel.vantw...@roswellpark.org>>
> wrote:
>
>
>
> We've made some changes to our annotation generation scripts this release
> and it seems these may have introduced some errors. Thank you for
> identifying this issue and I will try to have some fixes out asap.
>
> ____
> From: Bioc-devel  bioc-devel-boun...@r-project.org> on behalf of James
> W. MacDonald <mailto:jmac...@uw.edu>
> Sent: Tuesday, April 23, 2019 11:03:02 AM
> To: Aaron Lun
> Cc: Bioc-devel
> Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Looks like the ensembl table of the human.db0 package got polluted with
> *Pan
> troglodytes* genes:
>
>
>
> con <- dbConnect(SQLite(),
>
>
> "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
>
>
> dbGetQuery(con, "select count(*) from ensembl where ensid like
>
>
> 'ENSPTR%';")
>   count(*)
> 116207
>
>
> dbGetQuery(con, "select count(*) from ensembl where ensid like
>
>
> 'ENSG%';")
>   count(*)
> 128973
>
> On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
> infinite.monkeys.with.keyboa...@gmail.com infinite.monkeys.with.keyboa...@gmail.com>> wrote:
>
>
>
> Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?
>
>  > library(org.Hs.eg.db)
>  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>   GCG
> "ENSPTRG777"
>
> Well, at least it still recovers the right identifier... eventually.
>
>  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>SYMBOLENSEMBL
> 1GCG ENSPTRG777
> 2GCGENSG0115263
>
> The SYMBOL->Entrez ID relational table seems to be okay:
>
>  > Y <- toTable(org.Hs.egSYMBOL)
>  > Y[which(Y[,2]=="GCG"),]
>   gene_id symbol
> 21522641GCG
>
> So the cause is the Ensembl->Entrez mappings:
>
>  > Z <- toTable(org.Hs.egENSEMBL2EG)
>  > Z[Z[,1]==2641,]
>   gene_id ensembl_id
> 30282641 ENSPTRG777
> 30292641ENSG0115263
>
> Googling suggests that ENSPTRG777 is an identifier for some
> other gene in one of the other monkeys. Hardly "Hs" stuff.
>
> Session info (not technically R 3.6, but I didn't think that would have
> been the cause):
>
>
>
> R Under development (unstable) (2019-04-11 r76379)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> Matrix products: default
> BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRES

Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-26 Thread Van Twisk, Daniel
I've pushed new 3.8.2 orgdbs that should propagate soon. They do not have this 
issue.


From: Bioc-devel  on behalf of Pages, Herve 

Sent: Thursday, April 25, 2019 9:19:35 PM
To: Aaron Lun; Vincent Carey
Cc: Bioc-devel; jmac...@u.washington.edu
Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Hi Aaron,

On 4/25/19 16:44, Aaron Lun wrote:

It doesn't seem like it - on my installation, org.Hs.eg.db is still...
monkeying around.


  __
 w  c(..)o   (
  \__(-)__)
  /\   (
 /(_)___)
 w /|
  | \
 m  m

Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The new 
packages are on their way and should become available via 
BiocManager::install() in the next 12 hours or so.

Hopefully they'll put an end to the Great Monkey Conspiracy!

Unfortunately we won't see the effect on tomorrow's build report, only on 
Saturday's report.

Cheers,

H.





On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey 
<mailto:st...@channing.harvard.edu>
wrote:



Has this situation been rectified?

On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
daniel.vantw...@roswellpark.org<mailto:daniel.vantw...@roswellpark.org>> wrote:



We've made some changes to our annotation generation scripts this release
and it seems these may have introduced some errors. Thank you for
identifying this issue and I will try to have some fixes out asap.


From: Bioc-devel 
<mailto:bioc-devel-boun...@r-project.org> on 
behalf of James
W. MacDonald <mailto:jmac...@uw.edu>
Sent: Tuesday, April 23, 2019 11:03:02 AM
To: Aaron Lun
Cc: Bioc-devel
Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Looks like the ensembl table of the human.db0 package got polluted with
*Pan
troglodytes* genes:



con <- dbConnect(SQLite(),


"/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")


dbGetQuery(con, "select count(*) from ensembl where ensid like


'ENSPTR%';")
  count(*)
116207


dbGetQuery(con, "select count(*) from ensembl where ensid like


'ENSG%';")
  count(*)
128973

On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
infinite.monkeys.with.keyboa...@gmail.com<mailto:infinite.monkeys.with.keyboa...@gmail.com>>
 wrote:



Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?

 > library(org.Hs.eg.db)
 > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
  GCG
"ENSPTRG777"

Well, at least it still recovers the right identifier... eventually.

 > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
   SYMBOLENSEMBL
1GCG ENSPTRG777
2GCGENSG0115263

The SYMBOL->Entrez ID relational table seems to be okay:

 > Y <- toTable(org.Hs.egSYMBOL)
 > Y[which(Y[,2]=="GCG"),]
  gene_id symbol
21522641GCG

So the cause is the Ensembl->Entrez mappings:

 > Z <- toTable(org.Hs.egENSEMBL2EG)
 > Z[Z[,1]==2641,]
  gene_id ensembl_id
30282641 ENSPTRG777
30292641ENSG0115263

Googling suggests that ENSPTRG777 is an identifier for some
other gene in one of the other monkeys. Hardly "Hs" stuff.

Session info (not technically R 3.6, but I didn't think that would have
been the cause):



R Under development (unstable) (2019-04-11 r76379)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils


 datasets


[8] methods   base

other attached packages:
[1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
[4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1
 [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0
 [9] pkgconfig_2.0.2 memoise_1.1.0



___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=pRzAhoukTjoi6JCrxpZE

Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-25 Thread Pages, Herve
Hi Aaron,

On 4/25/19 16:44, Aaron Lun wrote:

It doesn't seem like it - on my installation, org.Hs.eg.db is still...
monkeying around.


  __
 w  c(..)o   (
  \__(-)__)
  /\   (
 /(_)___)
 w /|
  | \
 m  m

Daniel has prepared a new batch of *.db0 and org.* packages (v 3.8.1). The new 
packages are on their way and should become available via 
BiocManager::install() in the next 12 hours or so.

Hopefully they'll put an end to the Great Monkey Conspiracy!

Unfortunately we won't see the effect on tomorrow's build report, only on 
Saturday's report.

Cheers,

H.





On Thu, Apr 25, 2019 at 9:17 AM Vincent Carey 
<mailto:st...@channing.harvard.edu>
wrote:



Has this situation been rectified?

On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
daniel.vantw...@roswellpark.org<mailto:daniel.vantw...@roswellpark.org>> wrote:



We've made some changes to our annotation generation scripts this release
and it seems these may have introduced some errors. Thank you for
identifying this issue and I will try to have some fixes out asap.


From: Bioc-devel 
<mailto:bioc-devel-boun...@r-project.org> on 
behalf of James
W. MacDonald <mailto:jmac...@uw.edu>
Sent: Tuesday, April 23, 2019 11:03:02 AM
To: Aaron Lun
Cc: Bioc-devel
Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Looks like the ensembl table of the human.db0 package got polluted with
*Pan
troglodytes* genes:



con <- dbConnect(SQLite(),


"/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")


dbGetQuery(con, "select count(*) from ensembl where ensid like


'ENSPTR%';")
  count(*)
116207


dbGetQuery(con, "select count(*) from ensembl where ensid like


'ENSG%';")
  count(*)
128973

On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
infinite.monkeys.with.keyboa...@gmail.com<mailto:infinite.monkeys.with.keyboa...@gmail.com>>
 wrote:



Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?

 > library(org.Hs.eg.db)
 > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
  GCG
"ENSPTRG777"

Well, at least it still recovers the right identifier... eventually.

 > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
   SYMBOLENSEMBL
1GCG ENSPTRG777
2GCGENSG0115263

The SYMBOL->Entrez ID relational table seems to be okay:

 > Y <- toTable(org.Hs.egSYMBOL)
 > Y[which(Y[,2]=="GCG"),]
  gene_id symbol
21522641GCG

So the cause is the Ensembl->Entrez mappings:

 > Z <- toTable(org.Hs.egENSEMBL2EG)
 > Z[Z[,1]==2641,]
  gene_id ensembl_id
30282641 ENSPTRG777
30292641ENSG0115263

Googling suggests that ENSPTRG777 is an identifier for some
other gene in one of the other monkeys. Hardly "Hs" stuff.

Session info (not technically R 3.6, but I didn't think that would have
been the cause):



R Under development (unstable) (2019-04-11 r76379)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils


 datasets


[8] methods   base

other attached packages:
[1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
[4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1
 [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0
 [9] pkgconfig_2.0.2 memoise_1.1.0



___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=pRzAhoukTjoi6JCrxpZEHER0Dj7wqeCghzULGLFaTNQ=MxM9vCqiDsqvIw8l3iyam0_WN-7LHwlr6YiG_zb4vkQ=





--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

[[alternative HTML version deleted]]


Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-25 Thread Vincent Carey
Has this situation been rectified?

On Tue, Apr 23, 2019 at 11:40 AM Van Twisk, Daniel <
daniel.vantw...@roswellpark.org> wrote:

> We've made some changes to our annotation generation scripts this release
> and it seems these may have introduced some errors. Thank you for
> identifying this issue and I will try to have some fixes out asap.
>
> 
> From: Bioc-devel  on behalf of James W.
> MacDonald 
> Sent: Tuesday, April 23, 2019 11:03:02 AM
> To: Aaron Lun
> Cc: Bioc-devel
> Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db
>
> Looks like the ensembl table of the human.db0 package got polluted with
> *Pan
> troglodytes* genes:
>
> > con <- dbConnect(SQLite(),
> "/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
> > dbGetQuery(con, "select count(*) from ensembl where ensid like
> 'ENSPTR%';")
>   count(*)
> 116207
> > dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';")
>   count(*)
> 128973
>
> On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
> infinite.monkeys.with.keyboa...@gmail.com> wrote:
>
> > Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?
> >
> >  > library(org.Hs.eg.db)
> >  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> > 'select()' returned 1:many mapping between keys and columns
> >   GCG
> > "ENSPTRG777"
> >
> > Well, at least it still recovers the right identifier... eventually.
> >
> >  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> > 'select()' returned 1:many mapping between keys and columns
> >SYMBOLENSEMBL
> > 1GCG ENSPTRG777
> > 2GCGENSG0115263
> >
> > The SYMBOL->Entrez ID relational table seems to be okay:
> >
> >  > Y <- toTable(org.Hs.egSYMBOL)
> >  > Y[which(Y[,2]=="GCG"),]
> >   gene_id symbol
> > 21522641GCG
> >
> > So the cause is the Ensembl->Entrez mappings:
> >
> >  > Z <- toTable(org.Hs.egENSEMBL2EG)
> >  > Z[Z[,1]==2641,]
> >   gene_id ensembl_id
> > 30282641 ENSPTRG777
> > 30292641ENSG0115263
> >
> > Googling suggests that ENSPTRG777 is an identifier for some
> > other gene in one of the other monkeys. Hardly "Hs" stuff.
> >
> > Session info (not technically R 3.6, but I didn't think that would have
> > been the cause):
> >
> > > R Under development (unstable) (2019-04-11 r76379)
> > > Platform: x86_64-pc-linux-gnu (64-bit)
> > > Running under: Ubuntu 18.04.2 LTS
> > >
> > > Matrix products: default
> > > BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> > > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
> > >
> > > locale:
> > >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> > >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> > >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> > >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> > >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > >
> > > attached base packages:
> > > [1] parallel  stats4stats graphics  grDevices utils
>  datasets
> > > [8] methods   base
> > >
> > > other attached packages:
> > > [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
> > > [4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2
> > >
> > > loaded via a namespace (and not attached):
> > >  [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1
> > >  [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0
> > >  [9] pkgconfig_2.0.2 memoise_1.1.0
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> This email message may contain legally pr

Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-23 Thread Van Twisk, Daniel
We've made some changes to our annotation generation scripts this release and 
it seems these may have introduced some errors. Thank you for identifying this 
issue and I will try to have some fixes out asap.


From: Bioc-devel  on behalf of James W. 
MacDonald 
Sent: Tuesday, April 23, 2019 11:03:02 AM
To: Aaron Lun
Cc: Bioc-devel
Subject: Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

Looks like the ensembl table of the human.db0 package got polluted with *Pan
troglodytes* genes:

> con <- dbConnect(SQLite(),
"/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
> dbGetQuery(con, "select count(*) from ensembl where ensid like
'ENSPTR%';")
  count(*)
116207
> dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';")
  count(*)
128973

On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
infinite.monkeys.with.keyboa...@gmail.com> wrote:

> Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?
>
>  > library(org.Hs.eg.db)
>  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>   GCG
> "ENSPTRG777"
>
> Well, at least it still recovers the right identifier... eventually.
>
>  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>SYMBOLENSEMBL
> 1GCG ENSPTRG777
> 2GCGENSG0115263
>
> The SYMBOL->Entrez ID relational table seems to be okay:
>
>  > Y <- toTable(org.Hs.egSYMBOL)
>  > Y[which(Y[,2]=="GCG"),]
>   gene_id symbol
> 21522641GCG
>
> So the cause is the Ensembl->Entrez mappings:
>
>  > Z <- toTable(org.Hs.egENSEMBL2EG)
>  > Z[Z[,1]==2641,]
>   gene_id ensembl_id
> 30282641 ENSPTRG777
> 30292641ENSG0115263
>
> Googling suggests that ENSPTRG777 is an identifier for some
> other gene in one of the other monkeys. Hardly "Hs" stuff.
>
> Session info (not technically R 3.6, but I didn't think that would have
> been the cause):
>
> > R Under development (unstable) (2019-04-11 r76379)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 18.04.2 LTS
> >
> > Matrix products: default
> > BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats4stats graphics  grDevices utils datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
> > [4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2
> >
> > loaded via a namespace (and not attached):
> >  [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1
> >  [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0
> >  [9] pkgconfig_2.0.2 memoise_1.1.0
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-23 Thread James W. MacDonald
Looks like the ensembl table of the human.db0 package got polluted with *Pan
troglodytes* genes:

> con <- dbConnect(SQLite(),
"/R-devel/lib64/R/library/human.db0/extdata/chipsrc_human.sqlite")
> dbGetQuery(con, "select count(*) from ensembl where ensid like
'ENSPTR%';")
  count(*)
116207
> dbGetQuery(con, "select count(*) from ensembl where ensid like 'ENSG%';")
  count(*)
128973

On Mon, Apr 22, 2019 at 11:54 PM Aaron Lun <
infinite.monkeys.with.keyboa...@gmail.com> wrote:

> Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?
>
>  > library(org.Hs.eg.db)
>  > mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>   GCG
> "ENSPTRG777"
>
> Well, at least it still recovers the right identifier... eventually.
>
>  > select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
> 'select()' returned 1:many mapping between keys and columns
>SYMBOLENSEMBL
> 1GCG ENSPTRG777
> 2GCGENSG0115263
>
> The SYMBOL->Entrez ID relational table seems to be okay:
>
>  > Y <- toTable(org.Hs.egSYMBOL)
>  > Y[which(Y[,2]=="GCG"),]
>   gene_id symbol
> 21522641GCG
>
> So the cause is the Ensembl->Entrez mappings:
>
>  > Z <- toTable(org.Hs.egENSEMBL2EG)
>  > Z[Z[,1]==2641,]
>   gene_id ensembl_id
> 30282641 ENSPTRG777
> 30292641ENSG0115263
>
> Googling suggests that ENSPTRG777 is an identifier for some
> other gene in one of the other monkeys. Hardly "Hs" stuff.
>
> Session info (not technically R 3.6, but I didn't think that would have
> been the cause):
>
> > R Under development (unstable) (2019-04-11 r76379)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 18.04.2 LTS
> >
> > Matrix products: default
> > BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
> > LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats4stats graphics  grDevices utils datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5
> > [4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2
> >
> > loaded via a namespace (and not attached):
> >  [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1
> >  [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0
> >  [9] pkgconfig_2.0.2 memoise_1.1.0
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Weird monkey identifiers in org.Hs.eg.db

2019-04-22 Thread Aaron Lun

Playing around with org.Hs.eg.db 3.8.0. What on earth is ENSPTRG...?

> library(org.Hs.eg.db)
> mapIds(org.Hs.eg.db, key="GCG", keytype="SYMBOL", column="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
 GCG
"ENSPTRG777"

Well, at least it still recovers the right identifier... eventually.

> select(org.Hs.eg.db, key="GCG", keytype="SYMBOL", columns="ENSEMBL")
'select()' returned 1:many mapping between keys and columns
  SYMBOLENSEMBL
1GCG ENSPTRG777
2GCGENSG0115263

The SYMBOL->Entrez ID relational table seems to be okay:

> Y <- toTable(org.Hs.egSYMBOL)
> Y[which(Y[,2]=="GCG"),]
 gene_id symbol
21522641GCG

So the cause is the Ensembl->Entrez mappings:

> Z <- toTable(org.Hs.egENSEMBL2EG)
> Z[Z[,1]==2641,]
 gene_id ensembl_id
30282641 ENSPTRG777
30292641ENSG0115263

Googling suggests that ENSPTRG777 is an identifier for some 
other gene in one of the other monkeys. Hardly "Hs" stuff.


Session info (not technically R 3.6, but I didn't think that would have 
been the cause):



R Under development (unstable) (2019-04-11 r76379)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   


attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets 
[8] methods   base 


other attached packages:
[1] org.Hs.eg.db_3.8.0   AnnotationDbi_1.45.1 IRanges_2.17.5  
[4] S4Vectors_0.21.23Biobase_2.43.1   BiocGenerics_0.29.2 


loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1  digest_0.6.18   DBI_1.0.0   RSQLite_2.1.1  
 [5] blob_1.1.1  bit64_0.9-7 bit_1.1-14  compiler_3.7.0 
 [9] pkgconfig_2.0.2 memoise_1.1.0


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel