[Bioc-devel] Empty DataFrame Causes SummarizedExperiment Constructor Error
Good day, The default value of colData is DataFrame(). Not specifying an informative colData is fine. countsMini <- matrix(rpois(100, 100), ncol = 10) colnames(countsMini) <- paste("Cell", 1:10) rownames(countsMini) <- paste("Gene", 1:10) SummarizedExperiment(assays = list(counts = countsMini)) # Creates the object successfully. But, explicitly specifying an empty DataFrame triggers an error. I don't understand why it is not equivalent to the constructor's default. SummarizedExperiment(assays = list(counts = countsMini), colData = DataFrame()) Error in `rownames<-`(`*tmp*`, value = .get_colnames_from_first_assay(assays)) : invalid rownames length What is the subtle difference? It also seems like there could be a clearer error message emitted if this is caught in the right place. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] TCGAbiolinks fails
Good day, The package has checking errors which the developers of it need to fix themselves. Quitting from lines 114-121 (subtypes.Rmd) Error: processing vignette 'subtypes.Rmd' failed with diagnostics: object 'lgg.gbm.subtype' not found The installation error simply indicates that the package has never built successfully in Bioconductor 3.17. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] S4 Methods Documentation Convention Triggers Warnings
Good day, So, is the ultimate solution to manually change everything to the format of \item{\code{show(x)}:}{ ... } ? The warnings persist, so it does not seem as though R will revert to allowing the currently-popular syntax past its check. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] S4 Methods Documentation Convention Triggers Warnings
Good day, For a long time, it has been a convention to document S4 methods in the format: \section{Displaying}{ In the code snippets below, \code{x} is a GRanges object. \describe{ \item{}{ \code{show(x)}: Displays the first five and last five elements. } } } In R Under Development, this is now a warning: * checking Rd files ... WARNING checkRd: (5) GRanges-class.Rd:115-165: \item in \describe must have non-empty label. This affects my own package as well as the core Bioconductor packages which I used as inspiration for designing my pacakge documentation seven years ago. What should the new convention be? Or could R developers be convinced to get rid of this check before this prototype is released? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] DataFrameList to Wide Format DataFrame
Hello, Ah, yes, the sample names should of course be in the rows - Friday afternoon error. In the question, I specified "largely the same set of features", implying that the overlap is not complete. So, the example below will error. DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]), Y = DataFrame(b = 4:6, c = 6:4, row.names = LETTERS[20:22])) unlist(DFL) Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = strict.colnames) : the DFrame objects to combine must have the same column names This is long but works: allFeatures <- unique(unlist(lapply(DFL, colnames))) DFL <- lapply(DFL, function(DF) { missingFeatures <- setdiff(allFeatures, colnames(DF)) DF[missingFeatures] <- NA DF }) DFLflattened <- do.call(rbind, DFL) Is there a one-line function for it? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] DataFrameList to Wide Format DataFrame
Good day, Is there a function in the S4Vectors API which converts a DataFrameList into a DataFrame, automatically putting the list names into one of the metadata columns, analogous to MultiAssayExperiment's wideFormat function? The scenario is mutliple data sets from different organisations measuring the largely the same set of features and patient outcome, but on completely different sets of patients in each organisation. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] bpparam Non-deterministic Default
Hello, Might it instead made possible to set an RNGseed value by specifying one to bpparam but still get the automated back-end selection, so that it could easily be set to a particular value in an R package? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] bpparam Non-deterministic Default
Good day, I maintain an R package which makes use of functions such as bplapply which has bpparam() as the default. I have received feedback from a beginnre user that the results change when he knitted his R Markdown document a second time. This stems from the default constructor of bpparam() which sets no RNGseed. I am wondering about the desirability of changing the RNGseed default in BiocParallel to a particular uncontroversial number, such as 12345, so that beginners get deterministic behaviour. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] S4 Method Slow Execution if Signature Has Multiple Class Unions
Good day, I created two constructor methods for a generic function. One is for the default empty constructor and the other is a constructor when any one or more parameters is specified by the user. The method signatures are: 1. c("missing", "missing", "missing", "missing", "missing", "missing", "missing", "missing"), 2. c("characterOrMissing", "numericOrMissing", "numericOrMissing", "numericOrMissing", "numericOrMissing", "characterOrMissing", "BiocParallelParamOrMissing", "numericOrMissing") The class unions are defined as you might expect. setClassUnion("characterOrMissing", c("character", "missing")) setClassUnion("numericOrMissing", c("numeric", "missing")) setClassUnion("BiocParallelParamOrMissing", c("BiocParallelParam", "missing")) The first method works as expected: > system.time(CrossValParams()) user system elapsed 0.165 0.000 0.165 The second takes over ten minutes and constantly uses 100% CPU usage, according to top. > system.time(CrossValParams("Leave-k-Out", leave = 2)) user system elapsed 760.018 15.093 775.090 Strangely, if I rerun this code again, it works quickly the second time. > system.time(CrossValParams("Leave-k-Out", leave = 2)) user system elapsed 0.145 0.000 0.145 I haven't been able to come up with a minimal reproducile example of the issue. How can this be done consistently and efficiently? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Delayed Assignment to S4 Slots
Good day, I have an S4 class with some slots in my Bioconductor package. One of the slots stores the range of top variables to try during feature selection (the variables might be ranked by some score, like a t-test). The empty constructor looks like setMethod("ResubstituteParams", "missing", function() { new("ResubstituteParams", nFeatures = seq(10, 100, 10), performanceType = "balanced error") }) But, someone might have a small omics data set with only 40 features (e.g. CyTOF). Therefore, trying the top 10, 20, ..., 100 is not a good default. A good default would wait until the S4 class is accessed within cross-validation and then, based on the dimensions of the matrix or DataFrame, pick a suitable range. I looked at delayedAssign, but x is described as "a variable name (given as a quoted string in the function call)". It doesn't seem to apply to S4 slots based on my understanding of it. > r <- ResubstituteParams() > delayedAssign("r@nFeatures", nrow(measurements)) > measurements <- matrix(1:100, ncol = 10) > r@nFeatures # Still the value from empty constructor. [1] 10 20 30 40 50 60 70 80 90 100 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Windows-specific Function Not Found Error
Hello, Ah, I had a few different uses of MultiAssayExperiment::colData in a particular function of the package, but one line had only colData without the scoping in front. I wish that R error messages displayed R file names and line numbers more often. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Windows-specific Function Not Found Error
Good day, I see a checking failure for ClassifyR for Windows Server 2019 only. The error is Error: BiocParallel errors 4 remote errors, element index: 1, 4, 6, 8 6 unevaluated and other errors first remote error: could not find function "colData" Is there anything I can change in my code to help it pass? The error doesn't appear on the two other Bioconductor servers. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel