Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-04-18 Thread James W. MacDonald
We have to re-generate these packages - there was an error that excluded 65 
genes (total) from nine of the species. They will be available early next week.

Sorry for the delay!

Best,

Jim



-Original Message-
From: Bioc-devel  On Behalf Of Kern, Lori via 
Bioc-devel
Sent: Thursday, March 28, 2024 8:47 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

Hello Bioconductor community,

The newest db0, OrgDb, and TxDb annotation packages for the upcoming 
Bioconductor 3.19 release are up and available for download in the devel 
version of Bioconductor.

The deadline for submitting contributed annotation packages will be Wednesday 
April 17 th.

The new db0 packages are:

anopheles.db0_3.19.0.tar.gz
arabidopsis.db0_3.19.0.tar.gz
bovine.db0_3.19.0.tar.gz
canine.db0_3.19.0.tar.gz
chicken.db0_3.19.0.tar.gz
chimp.db0_3.19.0.tar.gz
ecoliK12.db0_3.19.0.tar.gz
ecoliSakai.db0_3.19.0.tar.gz
fly.db0_3.19.0.tar.gz
human.db0_3.19.0.tar.gz
malaria.db0_3.19.0.tar.gz
mouse.db0_3.19.0.tar.gz
pig.db0_3.19.0.tar.gz
rat.db0_3.19.0.tar.gz
rhesus.db0_3.19.0.tar.gz
worm.db0_3.19.0.tar.gz
xenopus.db0_3.19.0.tar.gz
yeast.db0_3.19.0.tar.gz
zebrafish.db0_3.19.0.tar.gz

The new OrgDb packages are:

GO.db_3.19.0.tar.gz
org.Ag.eg.db_3.19.0.tar.gz
org.At.tair.db_3.19.0.tar.gz
org.Bt.eg.db_3.19.0.tar.gz
org.Ce.eg.db_3.19.0.tar.gz
org.Cf.eg.db_3.19.0.tar.gz
org.Dm.eg.db_3.19.0.tar.gz
org.Dr.eg.db_3.19.0.tar.gz
org.EcK12.eg.db_3.19.0.tar.gz
org.EcSakai.eg.db_3.19.0.tar.gz
org.Gg.eg.db_3.19.0.tar.gz
org.Hs.eg.db_3.19.0.tar.gz
org.Mm.eg.db_3.19.0.tar.gz
org.Mmu.eg.db_3.19.0.tar.gz
org.Pt.eg.db_3.19.0.tar.gz
org.Rn.eg.db_3.19.0.tar.gz
org.Sc.eg.db_3.19.0.tar.gz
org.Ss.eg.db_3.19.0.tar.gz
org.Xl.eg.db_3.19.0.tar.gz
Orthology.eg.db_3.19.0.tar.gz
PFAM.db_3.19.0.tar.gz

The new TxDb packages are:

TxDb.Hsapiens.UCSC.hg38.refGene_3.19.0.tar.gz
TxDb.Mmusculus.UCSC.mm39.refGene_3.19.0.tar.gz

Thank you


Lori Shepherd - Kern

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/bioc-devel__;!!K-Hz7m0Vt54!gZcb7322-cKvOhpvHLdqmBBSuTlsBwoScUF_wSPkfkHdeOZU0Io3cxe2EZVnnlfD28h26IEk0KQlEWt242ciQQ$
 

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread Robert Castelo
hase) :
>The "phase" metadata column contains non-NA values for features of type
>stop_codon. This information was ignored.
>> subsetByOverlaps(transcriptsBy(z), GRanges("chrM:1-16569"))
> GRangesList object of length 37:
> $ENST0386347.1
> GRanges object with 1 range and 2 metadata columns:
>seqnamesranges strand | tx_id   tx_name
>   |
>[1] chrM 3230-3304  + |252881 ENST0386347.1
>---
>seqinfo: 439 sequences (1 circular) from an unspecified genome; no 
> seqlengths
> 
>
> As with most human endeavors, the weight of history hangs heavy on 
> Bioconductor, and it’s often easier to understand how the machine works 
> rather than trying to change things that are set in stone. I mean I have 
> tried many (many) times to get affy removed in lieu of oligo, to no avail.
>
> From: Tim Triche, Jr.
> Sent: Thursday, March 28, 2024 10:40 AM
> To: James W. MacDonald
> Cc: Vincent Carey;bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now 
> available
>
> is this an argument in favor of using ENSEMBL gene and transcript IDs rather 
> than ENTREZ or UCSC? Or just changing the way that the databases are keyed? 
> There really ought not to be transcripts for a gene on a different chromosome 
> from the
> ZjQcmQRYFpfptBannerStart
> This Message Is From an Untrusted Sender
> You have not previously corresponded with this sender.
> Seehttps://itconnect.uw.edu/email-tags  for additional information. Please 
> contact the UW-IT Service Center,h...@uw.edu<mailto:h...@uw.edu>  
> 206.221.5000, for assistance.
> ZjQcmQRYFpfptBannerEnd
> is this an argument in favor of using ENSEMBL gene and transcript IDs rather 
> than ENTREZ or UCSC?  Or just changing the way that the databases are keyed?  
> There really ought not to be transcripts for a gene on a different chromosome 
> from the gene, although the MHC and KIR loci (with alt contigs) somewhat 
> force the issue for that.  (we could discuss graph genomes here, but we 
> aren't going to do that, because all gene->transcript->contig mappings start 
> to break)
>
> Omission of an entire chromosome seems... bad?  Regardless of the technical 
> reason why.  There are arguments in favor of e.g. gene -> transcript -> 
> contig where each relationship is potentially 1:many, but if chrM can't be 
> sorted out then I am dubious that more complicated mappings can be 
> efficiently handled.  chrM is particularly weird in that it can have multiple 
> haplotypes (i.e. contigs) even within the same cell, but at some point, 
> simplifications are merited
>
>
> --t
>
>
> On Thu, Mar 28, 2024 at 10:12 AM James W. MacDonald 
> mailto:jmac...@uw.edu>> wrote:
> As well as
>
>> subsetByOverlaps(transcripts(Homo.sapiens), GRanges("chrM:1-16569"))
> 'select()' returned 1:1 mapping between keys and columns
> GRanges object with 37 ranges and 2 metadata columns:
> seqnames  ranges strand |  TXIDTXNAME
>  |
> [1] chrM 577-647  + |252799 ENST0387314.1
> [2] chrM648-1601  + |252800 ENST0389680.2
> [3] chrM   1602-1670  + |252801 ENST0387342.1
> [4] chrM   1671-3229  + |252802 ENST0387347.2
> [5] chrM   3230-3304  + |252803 ENST0386347.1
> ...  ... ...... .   ...   ...
>[33] chrM   5826-5891  - |252831 ENST0387409.1
>[34] chrM   7446-7514  - |252832 ENST0387416.2
>[35] chrM 14149-14673  - |252833 ENST0361681.2
>[36] chrM 14674-14742  - |252834 ENST0387459.1
>[37] chrM 15956-16023  - |252835 ENST0387461.2
>---
>seqinfo: 711 sequences (1 circular) from hg38 genome
>
> However
>
>> subsetByOverlaps(transcriptsBy(Homo.sapiens), GRanges("chrM:1-16569"))
> GRangesList object of length 0:
> <0 elements>
>
> And
>
>> subsetByOverlaps(transcripts(Homo.sapiens, columns = c("GENEID","SYMBOL")), 
>> GRanges("chrM:1-16569"))
> 'select()' returned 1:1 mapping between keys and columns
> GRanges object with 37 ranges and 2 metadata columns:
> seqnames  ranges strand |  GENEID  SYMBOL
>  |  
> [1] chrM 577-647  + |
> [2] chrM648-1601  + |
> [3] chrM   1602-1670  + |   

Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread James W. MacDonald
There are EnsDbs for Ensembl builds 87-111 that Johannes Rainier submits to the 
AnnotationHub for those who want to use Ensembl mappings. And a direct build of 
a TxDb using a UCSC GTF file has the chrM genes as well.

> ensdb <- hub[["AH100643"]]
> subsetByOverlaps(transcriptsBy(ensdb), GRanges("MT:1-16569"))
GRangesList object of length 37:
$ENSG0198695
GRanges object with 1 range and 11 metadata columns:
  seqnames  ranges strand |   tx_id tx_biotype
   | 
  [1]   MT 14149-14673  - | ENST0361681 protein_coding
  tx_cds_seq_start tx_cds_seq_end gene_id tx_support_level

  [1]14149  14673 ENSG0198695 
  tx_id_version gc_content tx_external_name tx_is_canonical
   
  [1] ENST0361681.242.6667   MT-ND6-201   1
  tx_name
  
  [1] ENST0361681
  ---
  seqinfo: 457 sequences (1 circular) from GRCh38 genome


> z <- 
> makeTxDbFromGFF(https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.knownGene.gtf.gz)
Import genomic features from the file as a GRanges object ... trying URL 
'https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.knownGene.gtf.gz'
Content type 'application/x-gzip' length 38959957 bytes (37.2 MB)
==
downloaded 37.2 MB

OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
> subsetByOverlaps(transcriptsBy(z), GRanges("chrM:1-16569"))
GRangesList object of length 37:
$ENST0386347.1
GRanges object with 1 range and 2 metadata columns:
  seqnamesranges strand | tx_id   tx_name
 |
  [1] chrM 3230-3304  + |252881 ENST0386347.1
  ---
  seqinfo: 439 sequences (1 circular) from an unspecified genome; no seqlengths


As with most human endeavors, the weight of history hangs heavy on 
Bioconductor, and it’s often easier to understand how the machine works rather 
than trying to change things that are set in stone. I mean I have tried many 
(many) times to get affy removed in lieu of oligo, to no avail.

From: Tim Triche, Jr. 
Sent: Thursday, March 28, 2024 10:40 AM
To: James W. MacDonald 
Cc: Vincent Carey ; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now 
available

is this an argument in favor of using ENSEMBL gene and transcript IDs rather 
than ENTREZ or UCSC? Or just changing the way that the databases are keyed? 
There really ought not to be transcripts for a gene on a different chromosome 
from the
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.
See https://itconnect.uw.edu/email-tags for additional information. Please 
contact the UW-IT Service Center, h...@uw.edu<mailto:h...@uw.edu> 206.221.5000, 
for assistance.
ZjQcmQRYFpfptBannerEnd
is this an argument in favor of using ENSEMBL gene and transcript IDs rather 
than ENTREZ or UCSC?  Or just changing the way that the databases are keyed?  
There really ought not to be transcripts for a gene on a different chromosome 
from the gene, although the MHC and KIR loci (with alt contigs) somewhat force 
the issue for that.  (we could discuss graph genomes here, but we aren't going 
to do that, because all gene->transcript->contig mappings start to break)

Omission of an entire chromosome seems... bad?  Regardless of the technical 
reason why.  There are arguments in favor of e.g. gene -> transcript -> contig 
where each relationship is potentially 1:many, but if chrM can't be sorted out 
then I am dubious that more complicated mappings can be efficiently handled.  
chrM is particularly weird in that it can have multiple haplotypes (i.e. 
contigs) even within the same cell, but at some point, simplifications are 
merited


--t


On Thu, Mar 28, 2024 at 10:12 AM James W. MacDonald 
mailto:jmac...@uw.edu>> wrote:
As well as

> subsetByOverlaps(transcripts(Homo.sapiens), GRanges("chrM:1-16569"))
'select()' returned 1:1 mapping between keys and columns
GRanges object with 37 ranges and 2 metadata columns:
   seqnames  ranges strand |  TXIDTXNAME
|
   [1] chrM 577-647  + |252799 ENST0387314.1
   [2] chrM648-1601  + |252800 ENST0389680.2
   [3] chrM   1602-1670  + |252801 ENST0387342.1
   [4] chrM   1671-3229  + |252802 ENST0387347.2
   [5] chrM   3230-3304  + |252803 ENST0386347.1
   ...  

Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread Tim Triche, Jr.
is this an argument in favor of using ENSEMBL gene and transcript IDs
rather than ENTREZ or UCSC?  Or just changing the way that the databases
are keyed?  There really ought not to be transcripts for a gene on a
different chromosome from the gene, although the MHC and KIR loci (with alt
contigs) somewhat force the issue for that.  (we could discuss graph
genomes here, but we aren't going to do that, because all
gene->transcript->contig mappings start to break)

Omission of an entire chromosome seems... bad?  Regardless of the technical
reason why.  There are arguments in favor of e.g. gene -> transcript ->
contig where each relationship is potentially 1:many, but if chrM can't be
sorted out then I am dubious that more complicated mappings can be
efficiently handled.  chrM is particularly weird in that it can have
multiple haplotypes (i.e. contigs) even within the same cell, but at some
point, simplifications are merited


--t


On Thu, Mar 28, 2024 at 10:12 AM James W. MacDonald  wrote:

> As well as
>
> > subsetByOverlaps(transcripts(Homo.sapiens), GRanges("chrM:1-16569"))
> 'select()' returned 1:1 mapping between keys and columns
> GRanges object with 37 ranges and 2 metadata columns:
>seqnames  ranges strand |  TXIDTXNAME
> |
>[1] chrM 577-647  + |252799 ENST0387314.1
>[2] chrM648-1601  + |252800 ENST0389680.2
>[3] chrM   1602-1670  + |252801 ENST0387342.1
>[4] chrM   1671-3229  + |252802 ENST0387347.2
>[5] chrM   3230-3304  + |252803 ENST0386347.1
>...  ... ...... .   ...   ...
>   [33] chrM   5826-5891  - |252831 ENST0387409.1
>   [34] chrM   7446-7514  - |252832 ENST0387416.2
>   [35] chrM 14149-14673  - |252833 ENST0361681.2
>   [36] chrM 14674-14742  - |252834 ENST0387459.1
>   [37] chrM 15956-16023  - |252835 ENST0387461.2
>   ---
>   seqinfo: 711 sequences (1 circular) from hg38 genome
>
> However
>
> > subsetByOverlaps(transcriptsBy(Homo.sapiens), GRanges("chrM:1-16569"))
> GRangesList object of length 0:
> <0 elements>
>
> And
>
> > subsetByOverlaps(transcripts(Homo.sapiens, columns =
> c("GENEID","SYMBOL")), GRanges("chrM:1-16569"))
> 'select()' returned 1:1 mapping between keys and columns
> GRanges object with 37 ranges and 2 metadata columns:
>seqnames  ranges strand |  GENEID  SYMBOL
> |  
>[1] chrM 577-647  + |
>[2] chrM648-1601  + |
>[3] chrM   1602-1670  + |
>[4] chrM   1671-3229  + |
>[5] chrM   3230-3304  + |
>...  ... ...... . ... ...
>   [33] chrM   5826-5891  - |
>   [34] chrM   7446-7514  - |
>   [35] chrM 14149-14673  - |
>   [36] chrM 14674-14742  - |
>   [37] chrM 15956-16023  - |
>   ---
>   seqinfo: 711 sequences (1 circular) from hg38 genome
>
> Everything is mapped via the GENEID, and if you query the UCSC genome
> browser for hg38/knownGene, asking for gene name, known gene ID and gene
> symbol, you will get the first and last but not the middle.
>
>
>
> -Original Message-
> From: Bioc-devel  On Behalf Of Vincent
> Carey
> Sent: Thursday, March 28, 2024 10:00 AM
> To: Tim Triche, Jr. 
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now
> available
>
> winging it here tim
>
> > select(Homo.sapiens, keys="ENSG0198727", keytype="ENSEMBL",
> columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL")) 'select()' returned
> 1:1 mapping between keys and columns
>   ENSEMBL GENENAME SYMBOL CDSCHROM GENEID
> 1 ENSG0198727 cytochrome b   CYTB4519
> > select(Homo.sapiens, keys= "MTCYBP1", keytype="SYMBOL",
> columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL")) 'select()' returned
> 1:1 mapping between keys and columns
>SYMBOLGENENAME CDSCHROMGENEID
> 1 MTCYBP1 MT-CYB pseudogene 1  100499418
>
> relevant?
>
> On Thu, Mar 28, 2024 at

Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread Tim Triche, Jr.
yep, looks like that's the problem

I wonder why it doesn't have a CDS chromosome in ENSEMBL?  That's a bit
nuts.

It is indeed indexed to MT in the canonical ENSEMBL database:
https://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG0198727;r=MT:14747-15887;t=ENST0361789


--t


On Thu, Mar 28, 2024 at 9:59 AM Vincent Carey 
wrote:

> winging it here tim
>
> > select(Homo.sapiens, keys="ENSG0198727", keytype="ENSEMBL",
> columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL"))
> 'select()' returned 1:1 mapping between keys and columns
>   ENSEMBL GENENAME SYMBOL CDSCHROM GENEID
> 1 ENSG0198727 cytochrome b   CYTB4519
> > select(Homo.sapiens, keys= "MTCYBP1", keytype="SYMBOL",
> columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL"))
> 'select()' returned 1:1 mapping between keys and columns
>SYMBOLGENENAME CDSCHROMGENEID
> 1 MTCYBP1 MT-CYB pseudogene 1  100499418
>
> relevant?
>
> On Thu, Mar 28, 2024 at 9:17 AM Tim Triche, Jr. 
> wrote:
>
>> Hi Lori and fellow maintainers,
>>
>> I had a strange experience yesterday where I pulled down genes and
>> transcripts from Homo.sapiens, only to discover that all mitochondrial
>> encoded genes (MT-CYB, MT-CO2, etc) were missing.
>>
>> Is there an historical reason why this is so? Obviously these transcripts
>> are physiologically important, but beyond that, they’re also used all the
>> time in single cell sequencing to estimate viability.
>>
>> Best,
>>
>> --t
>>
>> > On Mar 28, 2024, at 8:47 AM, Kern, Lori via Bioc-devel <
>> bioc-devel@r-project.org> wrote:
>> >
>> > Hello Bioconductor community,
>> >
>> > The newest db0, OrgDb, and TxDb annotation packages for the upcoming
>> Bioconductor 3.19 release are up and available for download in the devel
>> version of Bioconductor.
>> >
>> > The deadline for submitting contributed annotation packages will be
>> Wednesday April 17 th.
>> >
>> > The new db0 packages are:
>> >
>> > anopheles.db0_3.19.0.tar.gz
>> > arabidopsis.db0_3.19.0.tar.gz
>> > bovine.db0_3.19.0.tar.gz
>> > canine.db0_3.19.0.tar.gz
>> > chicken.db0_3.19.0.tar.gz
>> > chimp.db0_3.19.0.tar.gz
>> > ecoliK12.db0_3.19.0.tar.gz
>> > ecoliSakai.db0_3.19.0.tar.gz
>> > fly.db0_3.19.0.tar.gz
>> > human.db0_3.19.0.tar.gz
>> > malaria.db0_3.19.0.tar.gz
>> > mouse.db0_3.19.0.tar.gz
>> > pig.db0_3.19.0.tar.gz
>> > rat.db0_3.19.0.tar.gz
>> > rhesus.db0_3.19.0.tar.gz
>> > worm.db0_3.19.0.tar.gz
>> > xenopus.db0_3.19.0.tar.gz
>> > yeast.db0_3.19.0.tar.gz
>> > zebrafish.db0_3.19.0.tar.gz
>> >
>> > The new OrgDb packages are:
>> >
>> > GO.db_3.19.0.tar.gz
>> > org.Ag.eg.db_3.19.0.tar.gz
>> > org.At.tair.db_3.19.0.tar.gz
>> > org.Bt.eg.db_3.19.0.tar.gz
>> > org.Ce.eg.db_3.19.0.tar.gz
>> > org.Cf.eg.db_3.19.0.tar.gz
>> > org.Dm.eg.db_3.19.0.tar.gz
>> > org.Dr.eg.db_3.19.0.tar.gz
>> > org.EcK12.eg.db_3.19.0.tar.gz
>> > org.EcSakai.eg.db_3.19.0.tar.gz
>> > org.Gg.eg.db_3.19.0.tar.gz
>> > org.Hs.eg.db_3.19.0.tar.gz
>> > org.Mm.eg.db_3.19.0.tar.gz
>> > org.Mmu.eg.db_3.19.0.tar.gz
>> > org.Pt.eg.db_3.19.0.tar.gz
>> > org.Rn.eg.db_3.19.0.tar.gz
>> > org.Sc.eg.db_3.19.0.tar.gz
>> > org.Ss.eg.db_3.19.0.tar.gz
>> > org.Xl.eg.db_3.19.0.tar.gz
>> > Orthology.eg.db_3.19.0.tar.gz
>> > PFAM.db_3.19.0.tar.gz
>> >
>> > The new TxDb packages are:
>> >
>> > TxDb.Hsapiens.UCSC.hg38.refGene_3.19.0.tar.gz
>> > TxDb.Mmusculus.UCSC.mm39.refGene_3.19.0.tar.gz
>> >
>> > Thank you
>> >
>> >
>> > Lori Shepherd - Kern
>> >
>> > Bioconductor Core Team
>> >
>> > Roswell Park Comprehensive Cancer Center
>> >
>> > Department of Biostatistics & Bioinformatics
>> >
>> > Elm & Carlton Streets
>> >
>> > Buffalo, New York 14263
>> >
>> >
>> > This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee or
>> agent responsible for the delivery of this message to the intended
>> recipient(s), you are hereby notified that any disclosure, copying,
>> distribution, or use of this email message is prohibited.  If you have
>> received this message in error, please notify the sender immediately by
>> e-mail and delete this email message from your computer. Thank you.
>> >[[alternative HTML version deleted]]
>> >
>> > ___
>> > Bioc-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> The information in this email is intended only for the...{{dropped:11}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread James W. MacDonald
As well as

> subsetByOverlaps(transcripts(Homo.sapiens), GRanges("chrM:1-16569"))
'select()' returned 1:1 mapping between keys and columns
GRanges object with 37 ranges and 2 metadata columns:
   seqnames  ranges strand |  TXIDTXNAME
|
   [1] chrM 577-647  + |252799 ENST0387314.1
   [2] chrM648-1601  + |252800 ENST0389680.2
   [3] chrM   1602-1670  + |252801 ENST0387342.1
   [4] chrM   1671-3229  + |252802 ENST0387347.2
   [5] chrM   3230-3304  + |252803 ENST0386347.1
   ...  ... ...... .   ...   ...
  [33] chrM   5826-5891  - |252831 ENST0387409.1
  [34] chrM   7446-7514  - |252832 ENST0387416.2
  [35] chrM 14149-14673  - |252833 ENST0361681.2
  [36] chrM 14674-14742  - |252834 ENST0387459.1
  [37] chrM 15956-16023  - |252835 ENST0387461.2
  ---
  seqinfo: 711 sequences (1 circular) from hg38 genome

However

> subsetByOverlaps(transcriptsBy(Homo.sapiens), GRanges("chrM:1-16569"))
GRangesList object of length 0:
<0 elements>

And

> subsetByOverlaps(transcripts(Homo.sapiens, columns = c("GENEID","SYMBOL")), 
> GRanges("chrM:1-16569"))
'select()' returned 1:1 mapping between keys and columns
GRanges object with 37 ranges and 2 metadata columns:
   seqnames  ranges strand |  GENEID  SYMBOL
|  
   [1] chrM 577-647  + |
   [2] chrM648-1601  + |
   [3] chrM   1602-1670  + |
   [4] chrM   1671-3229  + |
   [5] chrM   3230-3304  + |
   ...  ... ...... . ... ...
  [33] chrM   5826-5891  - |
  [34] chrM   7446-7514  - |
  [35] chrM 14149-14673  - |
  [36] chrM 14674-14742  - |
  [37] chrM 15956-16023  - |
  ---
  seqinfo: 711 sequences (1 circular) from hg38 genome

Everything is mapped via the GENEID, and if you query the UCSC genome browser 
for hg38/knownGene, asking for gene name, known gene ID and gene symbol, you 
will get the first and last but not the middle. 



-Original Message-
From: Bioc-devel  On Behalf Of Vincent Carey
Sent: Thursday, March 28, 2024 10:00 AM
To: Tim Triche, Jr. 
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now 
available

winging it here tim

> select(Homo.sapiens, keys="ENSG0198727", keytype="ENSEMBL",
columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL")) 'select()' returned 1:1 
mapping between keys and columns
  ENSEMBL GENENAME SYMBOL CDSCHROM GENEID
1 ENSG0198727 cytochrome b   CYTB4519
> select(Homo.sapiens, keys= "MTCYBP1", keytype="SYMBOL",
columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL")) 'select()' returned 1:1 
mapping between keys and columns
   SYMBOLGENENAME CDSCHROMGENEID
1 MTCYBP1 MT-CYB pseudogene 1  100499418

relevant?

On Thu, Mar 28, 2024 at 9:17 AM Tim Triche, Jr. 
wrote:

> Hi Lori and fellow maintainers,
>
> I had a strange experience yesterday where I pulled down genes and 
> transcripts from Homo.sapiens, only to discover that all mitochondrial 
> encoded genes (MT-CYB, MT-CO2, etc) were missing.
>
> Is there an historical reason why this is so? Obviously these 
> transcripts are physiologically important, but beyond that, they’re 
> also used all the time in single cell sequencing to estimate viability.
>
> Best,
>
> --t
>
> > On Mar 28, 2024, at 8:47 AM, Kern, Lori via Bioc-devel <
> bioc-devel@r-project.org> wrote:
> >
> > Hello Bioconductor community,
> >
> > The newest db0, OrgDb, and TxDb annotation packages for the upcoming
> Bioconductor 3.19 release are up and available for download in the 
> devel version of Bioconductor.
> >
> > The deadline for submitting contributed annotation packages will be
> Wednesday April 17 th.
> >
> > The new db0 packages are:
> >
> > anopheles.db0_3.19.0.tar.gz
> > arabidopsis.db0_3.19.0.tar.gz
> > bovine.db0_3.19.0.tar.gz
> > canine.db0_3.19.0.tar.gz
> > chicken.db0_3.19.0.tar.gz
> > chimp.db0_3.19.0.tar.gz
> > ecoliK12.db0_3.19.0.tar.gz
> > ecoliSakai.db0_3.19.0.tar.gz
> > fly.db0_3.19.0.tar.gz
>

Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread Vincent Carey
winging it here tim

> select(Homo.sapiens, keys="ENSG0198727", keytype="ENSEMBL",
columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL"))
'select()' returned 1:1 mapping between keys and columns
  ENSEMBL GENENAME SYMBOL CDSCHROM GENEID
1 ENSG0198727 cytochrome b   CYTB4519
> select(Homo.sapiens, keys= "MTCYBP1", keytype="SYMBOL",
columns=c("GENENAME", "GENEID", "CDSCHROM", "SYMBOL"))
'select()' returned 1:1 mapping between keys and columns
   SYMBOLGENENAME CDSCHROMGENEID
1 MTCYBP1 MT-CYB pseudogene 1  100499418

relevant?

On Thu, Mar 28, 2024 at 9:17 AM Tim Triche, Jr. 
wrote:

> Hi Lori and fellow maintainers,
>
> I had a strange experience yesterday where I pulled down genes and
> transcripts from Homo.sapiens, only to discover that all mitochondrial
> encoded genes (MT-CYB, MT-CO2, etc) were missing.
>
> Is there an historical reason why this is so? Obviously these transcripts
> are physiologically important, but beyond that, they’re also used all the
> time in single cell sequencing to estimate viability.
>
> Best,
>
> --t
>
> > On Mar 28, 2024, at 8:47 AM, Kern, Lori via Bioc-devel <
> bioc-devel@r-project.org> wrote:
> >
> > Hello Bioconductor community,
> >
> > The newest db0, OrgDb, and TxDb annotation packages for the upcoming
> Bioconductor 3.19 release are up and available for download in the devel
> version of Bioconductor.
> >
> > The deadline for submitting contributed annotation packages will be
> Wednesday April 17 th.
> >
> > The new db0 packages are:
> >
> > anopheles.db0_3.19.0.tar.gz
> > arabidopsis.db0_3.19.0.tar.gz
> > bovine.db0_3.19.0.tar.gz
> > canine.db0_3.19.0.tar.gz
> > chicken.db0_3.19.0.tar.gz
> > chimp.db0_3.19.0.tar.gz
> > ecoliK12.db0_3.19.0.tar.gz
> > ecoliSakai.db0_3.19.0.tar.gz
> > fly.db0_3.19.0.tar.gz
> > human.db0_3.19.0.tar.gz
> > malaria.db0_3.19.0.tar.gz
> > mouse.db0_3.19.0.tar.gz
> > pig.db0_3.19.0.tar.gz
> > rat.db0_3.19.0.tar.gz
> > rhesus.db0_3.19.0.tar.gz
> > worm.db0_3.19.0.tar.gz
> > xenopus.db0_3.19.0.tar.gz
> > yeast.db0_3.19.0.tar.gz
> > zebrafish.db0_3.19.0.tar.gz
> >
> > The new OrgDb packages are:
> >
> > GO.db_3.19.0.tar.gz
> > org.Ag.eg.db_3.19.0.tar.gz
> > org.At.tair.db_3.19.0.tar.gz
> > org.Bt.eg.db_3.19.0.tar.gz
> > org.Ce.eg.db_3.19.0.tar.gz
> > org.Cf.eg.db_3.19.0.tar.gz
> > org.Dm.eg.db_3.19.0.tar.gz
> > org.Dr.eg.db_3.19.0.tar.gz
> > org.EcK12.eg.db_3.19.0.tar.gz
> > org.EcSakai.eg.db_3.19.0.tar.gz
> > org.Gg.eg.db_3.19.0.tar.gz
> > org.Hs.eg.db_3.19.0.tar.gz
> > org.Mm.eg.db_3.19.0.tar.gz
> > org.Mmu.eg.db_3.19.0.tar.gz
> > org.Pt.eg.db_3.19.0.tar.gz
> > org.Rn.eg.db_3.19.0.tar.gz
> > org.Sc.eg.db_3.19.0.tar.gz
> > org.Ss.eg.db_3.19.0.tar.gz
> > org.Xl.eg.db_3.19.0.tar.gz
> > Orthology.eg.db_3.19.0.tar.gz
> > PFAM.db_3.19.0.tar.gz
> >
> > The new TxDb packages are:
> >
> > TxDb.Hsapiens.UCSC.hg38.refGene_3.19.0.tar.gz
> > TxDb.Mmusculus.UCSC.mm39.refGene_3.19.0.tar.gz
> >
> > Thank you
> >
> >
> > Lori Shepherd - Kern
> >
> > Bioconductor Core Team
> >
> > Roswell Park Comprehensive Cancer Center
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> >
> > This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
> >[[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
The information in this email is intended only for the p...{{dropped:15}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread Tim Triche, Jr.
Hi Lori and fellow maintainers,

I had a strange experience yesterday where I pulled down genes and transcripts 
from Homo.sapiens, only to discover that all mitochondrial encoded genes 
(MT-CYB, MT-CO2, etc) were missing. 

Is there an historical reason why this is so? Obviously these transcripts are 
physiologically important, but beyond that, they’re also used all the time in 
single cell sequencing to estimate viability. 

Best,

--t

> On Mar 28, 2024, at 8:47 AM, Kern, Lori via Bioc-devel 
>  wrote:
> 
> Hello Bioconductor community,
> 
> The newest db0, OrgDb, and TxDb annotation packages for the upcoming 
> Bioconductor 3.19 release are up and available for download in the devel 
> version of Bioconductor.
> 
> The deadline for submitting contributed annotation packages will be Wednesday 
> April 17 th.
> 
> The new db0 packages are:
> 
> anopheles.db0_3.19.0.tar.gz
> arabidopsis.db0_3.19.0.tar.gz
> bovine.db0_3.19.0.tar.gz
> canine.db0_3.19.0.tar.gz
> chicken.db0_3.19.0.tar.gz
> chimp.db0_3.19.0.tar.gz
> ecoliK12.db0_3.19.0.tar.gz
> ecoliSakai.db0_3.19.0.tar.gz
> fly.db0_3.19.0.tar.gz
> human.db0_3.19.0.tar.gz
> malaria.db0_3.19.0.tar.gz
> mouse.db0_3.19.0.tar.gz
> pig.db0_3.19.0.tar.gz
> rat.db0_3.19.0.tar.gz
> rhesus.db0_3.19.0.tar.gz
> worm.db0_3.19.0.tar.gz
> xenopus.db0_3.19.0.tar.gz
> yeast.db0_3.19.0.tar.gz
> zebrafish.db0_3.19.0.tar.gz
> 
> The new OrgDb packages are:
> 
> GO.db_3.19.0.tar.gz
> org.Ag.eg.db_3.19.0.tar.gz
> org.At.tair.db_3.19.0.tar.gz
> org.Bt.eg.db_3.19.0.tar.gz
> org.Ce.eg.db_3.19.0.tar.gz
> org.Cf.eg.db_3.19.0.tar.gz
> org.Dm.eg.db_3.19.0.tar.gz
> org.Dr.eg.db_3.19.0.tar.gz
> org.EcK12.eg.db_3.19.0.tar.gz
> org.EcSakai.eg.db_3.19.0.tar.gz
> org.Gg.eg.db_3.19.0.tar.gz
> org.Hs.eg.db_3.19.0.tar.gz
> org.Mm.eg.db_3.19.0.tar.gz
> org.Mmu.eg.db_3.19.0.tar.gz
> org.Pt.eg.db_3.19.0.tar.gz
> org.Rn.eg.db_3.19.0.tar.gz
> org.Sc.eg.db_3.19.0.tar.gz
> org.Ss.eg.db_3.19.0.tar.gz
> org.Xl.eg.db_3.19.0.tar.gz
> Orthology.eg.db_3.19.0.tar.gz
> PFAM.db_3.19.0.tar.gz
> 
> The new TxDb packages are:
> 
> TxDb.Hsapiens.UCSC.hg38.refGene_3.19.0.tar.gz
> TxDb.Mmusculus.UCSC.mm39.refGene_3.19.0.tar.gz
> 
> Thank you
> 
> 
> Lori Shepherd - Kern
> 
> Bioconductor Core Team
> 
> Roswell Park Comprehensive Cancer Center
> 
> Department of Biostatistics & Bioinformatics
> 
> Elm & Carlton Streets
> 
> Buffalo, New York 14263
> 
> 
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
>[[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Bioconductor 3.19 db0s, OrgDbs, and TxDbs now available

2024-03-28 Thread Kern, Lori via Bioc-devel
Hello Bioconductor community,

The newest db0, OrgDb, and TxDb annotation packages for the upcoming 
Bioconductor 3.19 release are up and available for download in the devel 
version of Bioconductor.

The deadline for submitting contributed annotation packages will be Wednesday 
April 17 th.

The new db0 packages are:

anopheles.db0_3.19.0.tar.gz
arabidopsis.db0_3.19.0.tar.gz
bovine.db0_3.19.0.tar.gz
canine.db0_3.19.0.tar.gz
chicken.db0_3.19.0.tar.gz
chimp.db0_3.19.0.tar.gz
ecoliK12.db0_3.19.0.tar.gz
ecoliSakai.db0_3.19.0.tar.gz
fly.db0_3.19.0.tar.gz
human.db0_3.19.0.tar.gz
malaria.db0_3.19.0.tar.gz
mouse.db0_3.19.0.tar.gz
pig.db0_3.19.0.tar.gz
rat.db0_3.19.0.tar.gz
rhesus.db0_3.19.0.tar.gz
worm.db0_3.19.0.tar.gz
xenopus.db0_3.19.0.tar.gz
yeast.db0_3.19.0.tar.gz
zebrafish.db0_3.19.0.tar.gz

The new OrgDb packages are:

GO.db_3.19.0.tar.gz
org.Ag.eg.db_3.19.0.tar.gz
org.At.tair.db_3.19.0.tar.gz
org.Bt.eg.db_3.19.0.tar.gz
org.Ce.eg.db_3.19.0.tar.gz
org.Cf.eg.db_3.19.0.tar.gz
org.Dm.eg.db_3.19.0.tar.gz
org.Dr.eg.db_3.19.0.tar.gz
org.EcK12.eg.db_3.19.0.tar.gz
org.EcSakai.eg.db_3.19.0.tar.gz
org.Gg.eg.db_3.19.0.tar.gz
org.Hs.eg.db_3.19.0.tar.gz
org.Mm.eg.db_3.19.0.tar.gz
org.Mmu.eg.db_3.19.0.tar.gz
org.Pt.eg.db_3.19.0.tar.gz
org.Rn.eg.db_3.19.0.tar.gz
org.Sc.eg.db_3.19.0.tar.gz
org.Ss.eg.db_3.19.0.tar.gz
org.Xl.eg.db_3.19.0.tar.gz
Orthology.eg.db_3.19.0.tar.gz
PFAM.db_3.19.0.tar.gz

The new TxDb packages are:

TxDb.Hsapiens.UCSC.hg38.refGene_3.19.0.tar.gz
TxDb.Mmusculus.UCSC.mm39.refGene_3.19.0.tar.gz

Thank you


Lori Shepherd - Kern

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel