Re: [Bioc-devel] Wrong skipping of tests when builidng on Bioconductor and R CMD check timeout

2023-12-12 Thread Jacopo Ronchi
Dear Hervé,

Thank you very much for your answer. Regarding the issue that my package
encounters during the building on SPB i had the same doubt. Indeed when i
include that variable locally in my Renviron file everything works as
expected (tests that should be skipped on Bioconductor are indeed ignored).
So maybe the slight differences in variables between the two build systems
might be the answer.

On the other hand, i did not consider the caching of resources used in
examples. Since i already use BiocFileCache in my package, i will extend
this also for other features used in examples! Thank you very much for this
very useful suggestion.

Kind regards,
Jacopo

Il mer 13 dic 2023, 00:00 Hervé Pagès  ha
scritto:

> Hi Jacopo,
>
> testthat::skip_on_bioc() relies on the IS_BIOC_BUILD_MACHINE environment
> variable to know whether it's on a BioC build machine or not.
>
> This environment variable is defined during the daily build via the
> Renviron.bioc file. Note that a link to this file is provided on the
> individual build reports e.g. here
> https://bioconductor.org/checkResults/3.19/bioc-LATEST/Biobase/
> ("Renviron settings" link).
>
> Maybe this environment variable is not defined on the Single Package
> Builder (SPB)? The SPB is the build system used during the package
> submission process. It runs on the same machines as the daily builds but my
> understanding is that it uses a slightly different set of variables. Maybe
> Lori can shed some light?
>
> As for the timeout on merida1 (Intel Mac), have you considered using
> BiocFileCache to cache the data that you download in your examples? You
> might still get a timeout the next time 'R CMD check' will run on our build
> machines, but it should go significantly faster after that.
>
> Best,
>
> H.
> On 12/12/23 07:22, Jacopo Ronchi wrote:
>
> Dear Developers,
>
> I am currently in the process of submitting my package on Bioconductor and
> I am facing some issues during the R CMD check on the Bioconductor Build
> System. Since I was not able to find any answers to my doubts, I decided to
> ask for your help before doing anything wrong.
>
> The build report for my package is available 
> here:http://bioconductor.org/spb_reports/MIRit_buildreport_20231211095232.html
>
> In particular, my package includes some functions where it accesses remote
> resources. Therefore, I included some "skip_on_bioc()" chunks at the
> beginning of these tests since I don't want my package to fail during the
> build process because of occasional down times. However, when I look at the
> build report, I notice that the relevant tests are not skipped.
> Furthermore, other tests that should be run are instead skipped on CRAN. I
> am referring to these lines:
>
> Skipped tests (2)
>On CRAN (2): 'test-topological-integration.R:23:5', 'test-utils.R:20:5'
>
> Lastly, I have an error during R CMD check on macOS, and I really don't
> know how to reduce the running time on this operating system. Currently, I
> have reshaped the testing suite to reduce the time spent on unit tests.
> However, on macOS, i guess that most of the time consumed is due to
> examples. Nevertheless, the most time consuming functions retrieve
> gene-sets from external resources and I can't reduce the download size of
> KEGG pathways, for example. What should I do?
>
> Sorry again for bothering you,
> Best regards,
> Jacopo
>
>   [[alternative HTML version deleted]]
>
> ___bioc-de...@r-project.org 
> mailing listhttps://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> --
> Hervé Pagès
>
> Bioconductor Core teamhpages.on.git...@gmail.com
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Wrong skipping of tests when builidng on Bioconductor and R CMD check timeout

2023-12-12 Thread Hervé Pagès
Hi Jacopo,

testthat::skip_on_bioc() relies on the IS_BIOC_BUILD_MACHINE environment 
variable to know whether it's on a BioC build machine or not.

This environment variable is defined during the daily build via the 
Renviron.bioc file. Note that a link to this file is provided on the 
individual build reports e.g. here 
https://bioconductor.org/checkResults/3.19/bioc-LATEST/Biobase/ 
("Renviron settings" link).

Maybe this environment variable is not defined on the Single Package 
Builder (SPB)? The SPB is the build system used during the package 
submission process. It runs on the same machines as the daily builds but 
my understanding is that it uses a slightly different set of variables. 
Maybe Lori can shed some light?

As for the timeout on merida1 (Intel Mac), have you considered using 
BiocFileCache to cache the data that you download in your examples? You 
might still get a timeout the next time 'R CMD check' will run on our 
build machines, but it should go significantly faster after that.

Best,

H.

On 12/12/23 07:22, Jacopo Ronchi wrote:
> Dear Developers,
>
> I am currently in the process of submitting my package on Bioconductor and
> I am facing some issues during the R CMD check on the Bioconductor Build
> System. Since I was not able to find any answers to my doubts, I decided to
> ask for your help before doing anything wrong.
>
> The build report for my package is available here:
> http://bioconductor.org/spb_reports/MIRit_buildreport_20231211095232.html
>
> In particular, my package includes some functions where it accesses remote
> resources. Therefore, I included some "skip_on_bioc()" chunks at the
> beginning of these tests since I don't want my package to fail during the
> build process because of occasional down times. However, when I look at the
> build report, I notice that the relevant tests are not skipped.
> Furthermore, other tests that should be run are instead skipped on CRAN. I
> am referring to these lines:
>
> Skipped tests (2)
> On CRAN (2): 'test-topological-integration.R:23:5', 'test-utils.R:20:5'
>
> Lastly, I have an error during R CMD check on macOS, and I really don't
> know how to reduce the running time on this operating system. Currently, I
> have reshaped the testing suite to reduce the time spent on unit tests.
> However, on macOS, i guess that most of the time consumed is due to
> examples. Nevertheless, the most time consuming functions retrieve
> gene-sets from external resources and I can't reduce the download size of
> KEGG pathways, for example. What should I do?
>
> Sorry again for bothering you,
> Best regards,
> Jacopo
>
>   [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

2023-12-12 Thread Hervé Pagès
FWIW I've documented the process of making a TxDb object for 
T2T-CHM13v2.0 there:

https://github.com/Bioconductor/GenomicFeatures/issues/65

Please comment there for any follow-up.

Note that we're considering wrapping this is an TxDb package that we'll 
make available to the community. It's a work-in-progress.

Thanks!

H.

On 12/12/23 07:29, James W. MacDonald wrote:
> Hi Christian,
>
> This conversation is off-topic, both for this listserv (it’s meant to help 
> people developing Bioconductor packages) and for the support site (which is 
> meant to help people with (again), Bioconductor packages. I’ll answer your 
> questions one more time, but if you have other questions, please move to 
> biostars.org, or just ask the ArchR people directly, since it’s their package.
>
> I believe you are misinterpreting what an OrgDb is intended to provide. There 
> is no positional data in an OrgDb, and what the CHM13 project has done is 
> completely positional (what data are provided in the ‘Gene Annotation’ 
> section of the CHM13 Github are all GFF files, which are meant to provide 
> positional information of genes on a genome).
>
> The OrgDb package provides functional and within-annotation mappings. You can 
> map an NCBI Gene ID to Ensembl, or to the HGNC gene symbol, or a GO term, 
> etc. For example, I can map Gene symbol P53 to NCBI Gene ID 7157, or its 
> UniProt symbol K7PPA8. If the new genome build says P53 has moved to a new 
> genomic position, that has no affect on what UniProt thinks the ID for that 
> gene’s protein should be, or what ID NCBI uses, or what GO terms are appended 
> to that gene. Functionally it’s the same gene. We just might think it is 
> located in a different place in the genome.
>
> The difference between CHM13 and GRCh38 is not materially different from the 
> difference between GRCh37 and GRCh38 (they represent the current knowledge of 
> the genome at a point in time), and while we supply TxDb packages for GRCh38 
> and GRCh37 (and variants based on NCBI’s mappings as well as Ensembl’s 
> mappings), we have never supplied more than one human OrgDb package, because 
> the positional and functional information are orthogonal.
>
> It seems pretty simple to make what you need though.
>
>> library(GenomicAlignments)
>> tx <- 
>> makeTxDbFromGFF(https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz)
> Import genomic features from the file as a GRanges object ... trying URL 
> 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz'
> Content type 'application/x-gzip' length 79009538 bytes (75.3 MB)
> downloaded 75.3 MB
>
> OK
> Prepare the 'metadata' data frame ... OK
> Make the TxDb object ... OK
> Warning messages:
> 1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
>some transcripts have no
>"transcript_id" attribute ==>
>their name ("tx_name" column in
>the TxDb object) was set to NA
> 2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
>the transcript names ("tx_name"
>column in the TxDb object)
>imported from the
>"transcript_id" attribute are
>not unique
> 3: In .find_exon_cds(exons, cds) : The following transcripts have
>exons that contain more than one
>CDS (only the first CDS was kept
>for each exon):
>rna-NM_001134939.1,
>rna-NM_001172437.2,
>rna-NM_001184961.1,
>rna-NM_001301020.1,
>rna-NM_001301302.1,
>rna-NM_001301371.1,
>rna-NM_002537.3,
>rna-NM_004152.3,
>rna-NM_015068.3, rna-NM_016178.2
>> tx
> TxDb object:
> # Db type: TxDb
> # Supporting package: GenomicFeatures
> # Data 
> source:https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz
> # Organism: NA
> # Taxonomy ID: NA
> # miRBase build ID: NA
> # Genome: NA
> # Nb of transcripts: 188205
> # Db created by: GenomicFeatures package from Bioconductor
> # Creation time: 2023-12-12 10:17:34 -0500 (Tue, 12 Dec 2023)
> # GenomicFeatures version at creation time: 1.54.1
> # RSQLite version at creation time: 2.3.1
> # DBSCHEMAVERSION: 1.2
>
> genomeAnnotation <- 
> createGenomeAnnotation(BSgenome.Hsapiens.NCBI.T2T.CHM13v2.0)
> geneAnnotation <- createGeneAnnotation(TxDb = tx, OrgDb = org.Hs.eg.db)
>
>
> Best,
>
> Jim
>
> From: Christian Arnold
> Sent: Tuesday, December 12, 2023 9:35 AM
> To: Vincent Carey; James W. 
> MacDonald
> Cc:bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects
>
> Dear Vincent and others, thanks for the reply! Irrespective of whether a 
> different OrgDb is required, the name itself suggested that there "should be" 
> also corresponding OrgDb and TxDb packages. I can build one on my own, I see, 
> is there anyone
> ZjQcmQRYFpfptBannerStart
> This Message Is From an Untrusted Sender
> 

Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

2023-12-12 Thread James W. MacDonald
Hi Christian,

This conversation is off-topic, both for this listserv (it’s meant to help 
people developing Bioconductor packages) and for the support site (which is 
meant to help people with (again), Bioconductor packages. I’ll answer your 
questions one more time, but if you have other questions, please move to 
biostars.org, or just ask the ArchR people directly, since it’s their package.

I believe you are misinterpreting what an OrgDb is intended to provide. There 
is no positional data in an OrgDb, and what the CHM13 project has done is 
completely positional (what data are provided in the ‘Gene Annotation’ section 
of the CHM13 Github are all GFF files, which are meant to provide positional 
information of genes on a genome).

The OrgDb package provides functional and within-annotation mappings. You can 
map an NCBI Gene ID to Ensembl, or to the HGNC gene symbol, or a GO term, etc. 
For example, I can map Gene symbol P53 to NCBI Gene ID 7157, or its UniProt 
symbol K7PPA8. If the new genome build says P53 has moved to a new genomic 
position, that has no affect on what UniProt thinks the ID for that gene’s 
protein should be, or what ID NCBI uses, or what GO terms are appended to that 
gene. Functionally it’s the same gene. We just might think it is located in a 
different place in the genome.

The difference between CHM13 and GRCh38 is not materially different from the 
difference between GRCh37 and GRCh38 (they represent the current knowledge of 
the genome at a point in time), and while we supply TxDb packages for GRCh38 
and GRCh37 (and variants based on NCBI’s mappings as well as Ensembl’s 
mappings), we have never supplied more than one human OrgDb package, because 
the positional and functional information are orthogonal.

It seems pretty simple to make what you need though.

> library(GenomicAlignments)
> tx <- 
> makeTxDbFromGFF(https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz)
Import genomic features from the file as a GRanges object ... trying URL 
'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz'
Content type 'application/x-gzip' length 79009538 bytes (75.3 MB)
downloaded 75.3 MB

OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  some transcripts have no
  "transcript_id" attribute ==>
  their name ("tx_name" column in
  the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  the transcript names ("tx_name"
  column in the TxDb object)
  imported from the
  "transcript_id" attribute are
  not unique
3: In .find_exon_cds(exons, cds) : The following transcripts have
  exons that contain more than one
  CDS (only the first CDS was kept
  for each exon):
  rna-NM_001134939.1,
  rna-NM_001172437.2,
  rna-NM_001184961.1,
  rna-NM_001301020.1,
  rna-NM_001301302.1,
  rna-NM_001301371.1,
  rna-NM_002537.3,
  rna-NM_004152.3,
  rna-NM_015068.3, rna-NM_016178.2
> tx
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: 
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 188205
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2023-12-12 10:17:34 -0500 (Tue, 12 Dec 2023)
# GenomicFeatures version at creation time: 1.54.1
# RSQLite version at creation time: 2.3.1
# DBSCHEMAVERSION: 1.2

genomeAnnotation <- createGenomeAnnotation(BSgenome.Hsapiens.NCBI.T2T.CHM13v2.0)
geneAnnotation <- createGeneAnnotation(TxDb = tx, OrgDb = org.Hs.eg.db)


Best,

Jim

From: Christian Arnold 
Sent: Tuesday, December 12, 2023 9:35 AM
To: Vincent Carey ; James W. MacDonald 

Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

Dear Vincent and others, thanks for the reply! Irrespective of whether a 
different OrgDb is required, the name itself suggested that there "should be" 
also corresponding OrgDb and TxDb packages. I can build one on my own, I see, 
is there anyone
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.
See https://itconnect.uw.edu/email-tags for additional information. Please 
contact the UW-IT Service Center, h...@uw.edu 206.221.5000, 
for assistance.
ZjQcmQRYFpfptBannerEnd

Dear Vincent and others,

thanks for the reply! Irrespective of whether a different OrgDb is required, 
the name itself suggested that there "should be" also corresponding OrgDb and 
TxDb packages. I can build one on my own, I see, is there anyone who works on 
providing the TxDB object for Bioc?

I am also asking this because the T2T 

[Bioc-devel] Wrong skipping of tests when builidng on Bioconductor and R CMD check timeout

2023-12-12 Thread Jacopo Ronchi
Dear Developers,

I am currently in the process of submitting my package on Bioconductor and
I am facing some issues during the R CMD check on the Bioconductor Build
System. Since I was not able to find any answers to my doubts, I decided to
ask for your help before doing anything wrong.

The build report for my package is available here:
http://bioconductor.org/spb_reports/MIRit_buildreport_20231211095232.html

In particular, my package includes some functions where it accesses remote
resources. Therefore, I included some "skip_on_bioc()" chunks at the
beginning of these tests since I don't want my package to fail during the
build process because of occasional down times. However, when I look at the
build report, I notice that the relevant tests are not skipped.
Furthermore, other tests that should be run are instead skipped on CRAN. I
am referring to these lines:

Skipped tests (2)
   On CRAN (2): 'test-topological-integration.R:23:5', 'test-utils.R:20:5'

Lastly, I have an error during R CMD check on macOS, and I really don't
know how to reduce the running time on this operating system. Currently, I
have reshaped the testing suite to reduce the time spent on unit tests.
However, on macOS, i guess that most of the time consumed is due to
examples. Nevertheless, the most time consuming functions retrieve
gene-sets from external resources and I can't reduce the download size of
KEGG pathways, for example. What should I do?

Sorry again for bothering you,
Best regards,
Jacopo

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects

2023-12-12 Thread Christian Arnold via Bioc-devel
Dear Vincent and others,

thanks for the reply! Irrespective of whether a different OrgDb is
required, the name itself suggested that there "should be" also
corresponding OrgDb and TxDb packages. I can build one on my own, I see,
is there anyone who works on providing the TxDB object for Bioc?

I am also asking this because the T2T people specifically provide an
"updated" gene annotation dataset which may differ from what's inside
OrgDb and may be incompatible with? See here:
https://github.com/marbl/CHM13:

/JHU RefSeqv110 + Liftoff v5.1
:
This containscuratedannotations of the ampliconic genes on the Y
chromosome, correcting annotation errors in GENCODEv35 CAT/Liftoff and
RefSeqv110 annotation. Additional copies found in T2T-Y were annotated
to the closest available gene in RefSeq, allowing multiple genes to have
the same common name. This file has been modified to correct special
character issues from the original file.
/

/
/

For ArchR, I tried to understand how one can create a new genome by
checking here:
https://www.archrproject.com/bookdown/getting-set-up.html. There, they
explicitly mention the TxDb and OrgDb objects that are needed for
building a custom genome. There seems to be another option when both or
any of these 2 is not available ("Alternatively, if you dont have
a|TxDb|and|OrgDb|object, you can create a|geneAnnotation|object from the
following information" ), but I first tried to do it the easy way as I
want to properly embed it in a pipeline with as little "custom" code as
possible.


Thanks,
Christian




On 11/12/2023 15:30, Vincent Carey wrote:
> Thanks Jim, I tend to agree with you.  Christian, I had a look at
> ArchR but could not tell where the
> system contacts the Bioc annotation elements.  Can you give some
> hints?  I'd like to be able to
> verify compatibility.
>
> On Mon, Dec 11, 2023 at 9:19 AM James W. MacDonald  wrote:
>
> I don't believe a different OrgDb is required. The OrgDb package
> is meant to provide annotations for genes such as gene symbol or
> GO term, etc, which are orthogonal to the sequence of the genome,
> so the current version should suffice.
>
> -Original Message-
> From: Bioc-devel  On Behalf Of
> Vincent Carey
> Sent: Sunday, December 10, 2023 1:44 PM
> To: Christian Arnold 
> Cc: bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] Missing CHM13v2.0 TxDB and OrgDb objects
>
> Good question.  I believe these will be forthcoming soon.  In the
> mean time you can create your own.  See, for example
>
> 
> https://urldefense.com/v3/__https://github.com/vjcitn/BiocT2T/blob/devel/inst/scripts/makeTxDb.R__;!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIMiLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adMcpWaIEw$
>
>
> It's an active area so you can pull a gff file from
> 
> https://urldefense.com/v3/__https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=T2T*CHM13*assemblies*annotation*__;Ly8vLw!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIMiLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adM7PNUeks$
> and adjust the code noted above for the TxDb.
>
> For the org.db I have to get back to you.
>
> On Sun, Dec 10, 2023 at 12:06 PM Christian Arnold via Bioc-devel <
> bioc-devel@r-project.org> wrote:
>
> > Hello, I am working with the new human T2T-CHM13v2.0 assembly and
> > while a BSgenome package already exists
> > (BSgenome.Hsapiens.NCBI.T2T.CHM13v2.0), I could not find the
> > corresponding TxDb and OrgDb packages. Is there any information
> when
> > they may also become available so it is easier to work with the new
> > genome for packages like ArchR, which support a custom genome
> but need
> > these standard annotation packages for their creation?
> >
> >
> > Thanks a lot for any information regarding this!
> >
> > Best, Christian
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/bioc
> >
> -devel__;!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIM
> > iLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adMOtbUwTc$
> >
>
> --
> The information in this e-mail is intended only for the
> ...{{dropped:18}}
>
> ___
> Bioc-devel@r-project.org mailing list
> 
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/bioc-devel__;!!K-Hz7m0Vt54!ixhBX1kJeZc-9e3gcVgd5OOsvXj8vYfmUZphWadsaXZmdIMiLYcLZEGkJmZhkFTxT-wXY5c_hr0C9adMOtbUwTc$
>
>
>
> The information in this e-mail is intended only for th...{{dropped:16}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel