[Bioc-devel] Empty DataFrame Causes SummarizedExperiment Constructor Error

2023-05-12 Thread Dario Strbenac via Bioc-devel
Good day,

The default value of colData is DataFrame(). Not specifying an informative 
colData is fine.

countsMini <- matrix(rpois(100, 100), ncol = 10)
colnames(countsMini) <- paste("Cell", 1:10)
rownames(countsMini) <- paste("Gene", 1:10)
SummarizedExperiment(assays = list(counts = countsMini)) # Creates the object 
successfully.

But, explicitly specifying an empty DataFrame triggers an error. I don't 
understand why it is not equivalent to the constructor's default.

SummarizedExperiment(assays = list(counts = countsMini), colData = DataFrame())
Error in `rownames<-`(`*tmp*`, value = .get_colnames_from_first_assay(assays)) 
: 
  invalid rownames length

What is the subtle difference? It also seems like there could be a clearer 
error message emitted if this is caught in the right place.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] TCGAbiolinks fails

2023-04-28 Thread Dario Strbenac via Bioc-devel
Good day,

The package has checking errors which the developers of it need to fix 
themselves.

Quitting from lines 114-121 (subtypes.Rmd) 
Error: processing vignette 'subtypes.Rmd' failed with diagnostics:
object 'lgg.gbm.subtype' not found

The installation error simply indicates that the package has never built 
successfully in Bioconductor 3.17.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] S4 Methods Documentation Convention Triggers Warnings

2023-01-27 Thread Dario Strbenac via Bioc-devel
Good day,

So, is the ultimate solution to manually change everything to the format of

\item{\code{show(x)}:}{
  ...
} ?

The warnings persist, so it does not seem as though R will revert to allowing 
the currently-popular syntax past its check.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] S4 Methods Documentation Convention Triggers Warnings

2022-11-26 Thread Dario Strbenac via Bioc-devel
Good day,

For a long time, it has been a convention to document S4 methods in the format:

\section{Displaying}{
  In the code snippets below, \code{x} is a GRanges object.
  \describe{
\item{}{
  \code{show(x)}:
  Displays the first five and last five elements.
}
  }
}

In R Under Development, this is now a warning:

* checking Rd files ... WARNING
checkRd: (5) GRanges-class.Rd:115-165: \item in \describe must have non-empty 
label.

This affects my own package as well as the core Bioconductor packages which I 
used as inspiration for designing my pacakge documentation seven years ago. 
What should the new convention be? Or could R developers be convinced to get 
rid of this check before this prototype is released?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-16 Thread Dario Strbenac via Bioc-devel
Hello,

Ah, yes, the sample names should of course be in the rows - Friday afternoon 
error. In the question, I specified "largely the same set of features", 
implying that the overlap is not complete. So, the example below will error.

DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]),
 Y = DataFrame(b = 4:6, c = 6:4, row.names = 
LETTERS[20:22]))
unlist(DFL)
Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = 
strict.colnames) : 
  the DFrame objects to combine must have the same column names

This is long but works:

allFeatures <- unique(unlist(lapply(DFL, colnames)))
DFL <- lapply(DFL, function(DF)
{
  missingFeatures <- setdiff(allFeatures, colnames(DF))
  DF[missingFeatures] <- NA
  DF
})
DFLflattened <- do.call(rbind, DFL)

Is there a one-line function for it?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] DataFrameList to Wide Format DataFrame

2021-12-16 Thread Dario Strbenac via Bioc-devel
Good day,

Is there a function in the S4Vectors API which converts a DataFrameList into a 
DataFrame, automatically putting the list names into one of the metadata 
columns, analogous to MultiAssayExperiment's wideFormat function? The scenario 
is mutliple data sets from different organisations measuring the largely the 
same set of features and patient outcome, but on completely different sets of 
patients in each organisation.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] bpparam Non-deterministic Default

2021-11-26 Thread Dario Strbenac via Bioc-devel
Hello,

Might it instead made possible to set an RNGseed value by specifying one to 
bpparam but still get the automated back-end selection, so that it could easily 
be set to a particular value in an R package?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] bpparam Non-deterministic Default

2021-11-26 Thread Dario Strbenac via Bioc-devel
Good day,

I maintain an R package which makes use of functions such as bplapply which has 
bpparam() as the default. I have received feedback from a beginnre user that 
the results change when he knitted his R Markdown document a second time. This 
stems from the default constructor of bpparam() which sets no RNGseed. I am 
wondering about the desirability of changing the RNGseed default in 
BiocParallel to a particular uncontroversial number, such as 12345, so that 
beginners get deterministic behaviour.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] S4 Method Slow Execution if Signature Has Multiple Class Unions

2021-11-22 Thread Dario Strbenac via Bioc-devel
Good day,

I created two constructor methods for a generic function. One is for the 
default empty constructor and the other is a constructor when any one or more 
parameters is specified by the user. The method signatures are:

1. c("missing", "missing", "missing", "missing", "missing", "missing", 
"missing", "missing"),
2. c("characterOrMissing", "numericOrMissing", "numericOrMissing", 
"numericOrMissing", "numericOrMissing", "characterOrMissing", 
"BiocParallelParamOrMissing", "numericOrMissing")

The class unions are defined as you might expect.

setClassUnion("characterOrMissing", c("character", "missing"))
setClassUnion("numericOrMissing", c("numeric", "missing"))
setClassUnion("BiocParallelParamOrMissing", c("BiocParallelParam", "missing"))

The first method works as expected:

> system.time(CrossValParams())
   user  system elapsed 
  0.165   0.000   0.165

The second takes over ten minutes and constantly uses 100% CPU usage, according 
to top.

> system.time(CrossValParams("Leave-k-Out", leave = 2))
   user  system elapsed 
760.018  15.093 775.090

Strangely, if I rerun this code again, it works quickly the second time.

> system.time(CrossValParams("Leave-k-Out", leave = 2))
   user  system elapsed 
  0.145   0.000   0.145

I haven't been able to come up with a minimal reproducile example of the issue. 
How can this be done consistently and efficiently?

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Delayed Assignment to S4 Slots

2021-10-13 Thread Dario Strbenac via Bioc-devel
Good day,

I have an S4 class with some slots in my Bioconductor package. One of the slots 
stores the range of top variables to try during feature selection (the 
variables might be ranked by some score, like a t-test). The empty constructor 
looks like

setMethod("ResubstituteParams", "missing", function()
{
  new("ResubstituteParams", nFeatures = seq(10, 100, 10), performanceType = 
"balanced error")
})

But, someone might have a small omics data set with only 40 features (e.g. 
CyTOF). Therefore, trying the top 10, 20, ..., 100 is not a good default. A 
good default would wait until the S4 class is accessed within cross-validation 
and then, based on the dimensions of the matrix or DataFrame, pick a suitable 
range. I looked at delayedAssign, but x is described as "a variable name (given 
as a quoted string in the function call)". It doesn't seem to apply to S4 slots 
based on my understanding of it.

> r <- ResubstituteParams()
> delayedAssign("r@nFeatures", nrow(measurements))
> measurements <- matrix(1:100, ncol = 10)
> r@nFeatures # Still the value from empty constructor.
 [1]  10  20  30  40  50  60  70  80  90 100

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Windows-specific Function Not Found Error

2021-10-12 Thread Dario Strbenac via Bioc-devel
Hello,

Ah, I had a few different uses of MultiAssayExperiment::colData in a particular 
function of the package, but one line had only colData without the scoping in 
front. I wish that R error messages displayed R file names and line numbers 
more often.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Windows-specific Function Not Found Error

2021-10-12 Thread Dario Strbenac via Bioc-devel
Good day,

I see a checking failure for ClassifyR for Windows Server 2019 only. The error 
is

Error: BiocParallel errors
  4 remote errors, element index: 1, 4, 6, 8
  6 unevaluated and other errors
  first remote error: could not find function "colData"

Is there anything I can change in my code to help it pass? The error doesn't 
appear on the two other Bioconductor  servers.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel