[Bioc-devel] Empty DataFrame Causes SummarizedExperiment Constructor Error
Good day, The default value of colData is DataFrame(). Not specifying an informative colData is fine. countsMini <- matrix(rpois(100, 100), ncol = 10) colnames(countsMini) <- paste("Cell", 1:10) rownames(countsMini) <- paste("Gene", 1:10) SummarizedExperiment(assays = list(counts = countsMini)) # Creates the object successfully. But, explicitly specifying an empty DataFrame triggers an error. I don't understand why it is not equivalent to the constructor's default. SummarizedExperiment(assays = list(counts = countsMini), colData = DataFrame()) Error in `rownames<-`(`*tmp*`, value = .get_colnames_from_first_assay(assays)) : invalid rownames length What is the subtle difference? It also seems like there could be a clearer error message emitted if this is caught in the right place. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] TCGAbiolinks fails
Good day, The package has checking errors which the developers of it need to fix themselves. Quitting from lines 114-121 (subtypes.Rmd) Error: processing vignette 'subtypes.Rmd' failed with diagnostics: object 'lgg.gbm.subtype' not found The installation error simply indicates that the package has never built successfully in Bioconductor 3.17. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] S4 Methods Documentation Convention Triggers Warnings
Good day, So, is the ultimate solution to manually change everything to the format of \item{\code{show(x)}:}{ ... } ? The warnings persist, so it does not seem as though R will revert to allowing the currently-popular syntax past its check. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] S4 Methods Documentation Convention Triggers Warnings
Good day, For a long time, it has been a convention to document S4 methods in the format: \section{Displaying}{ In the code snippets below, \code{x} is a GRanges object. \describe{ \item{}{ \code{show(x)}: Displays the first five and last five elements. } } } In R Under Development, this is now a warning: * checking Rd files ... WARNING checkRd: (5) GRanges-class.Rd:115-165: \item in \describe must have non-empty label. This affects my own package as well as the core Bioconductor packages which I used as inspiration for designing my pacakge documentation seven years ago. What should the new convention be? Or could R developers be convinced to get rid of this check before this prototype is released? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] DataFrameList to Wide Format DataFrame
Hello, Ah, yes, the sample names should of course be in the rows - Friday afternoon error. In the question, I specified "largely the same set of features", implying that the overlap is not complete. So, the example below will error. DFL <- DataFrameList(X = DataFrame(a = 1:3, b = 3:1, row.names = LETTERS[1:3]), Y = DataFrame(b = 4:6, c = 6:4, row.names = LETTERS[20:22])) unlist(DFL) Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = strict.colnames) : the DFrame objects to combine must have the same column names This is long but works: allFeatures <- unique(unlist(lapply(DFL, colnames))) DFL <- lapply(DFL, function(DF) { missingFeatures <- setdiff(allFeatures, colnames(DF)) DF[missingFeatures] <- NA DF }) DFLflattened <- do.call(rbind, DFL) Is there a one-line function for it? ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] DataFrameList to Wide Format DataFrame
Good day, Is there a function in the S4Vectors API which converts a DataFrameList into a DataFrame, automatically putting the list names into one of the metadata columns, analogous to MultiAssayExperiment's wideFormat function? The scenario is mutliple data sets from different organisations measuring the largely the same set of features and patient outcome, but on completely different sets of patients in each organisation. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] bpparam Non-deterministic Default
Hello, Might it instead made possible to set an RNGseed value by specifying one to bpparam but still get the automated back-end selection, so that it could easily be set to a particular value in an R package? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] bpparam Non-deterministic Default
Good day, I maintain an R package which makes use of functions such as bplapply which has bpparam() as the default. I have received feedback from a beginnre user that the results change when he knitted his R Markdown document a second time. This stems from the default constructor of bpparam() which sets no RNGseed. I am wondering about the desirability of changing the RNGseed default in BiocParallel to a particular uncontroversial number, such as 12345, so that beginners get deterministic behaviour. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] S4 Method Slow Execution if Signature Has Multiple Class Unions
Good day, I created two constructor methods for a generic function. One is for the default empty constructor and the other is a constructor when any one or more parameters is specified by the user. The method signatures are: 1. c("missing", "missing", "missing", "missing", "missing", "missing", "missing", "missing"), 2. c("characterOrMissing", "numericOrMissing", "numericOrMissing", "numericOrMissing", "numericOrMissing", "characterOrMissing", "BiocParallelParamOrMissing", "numericOrMissing") The class unions are defined as you might expect. setClassUnion("characterOrMissing", c("character", "missing")) setClassUnion("numericOrMissing", c("numeric", "missing")) setClassUnion("BiocParallelParamOrMissing", c("BiocParallelParam", "missing")) The first method works as expected: > system.time(CrossValParams()) user system elapsed 0.165 0.000 0.165 The second takes over ten minutes and constantly uses 100% CPU usage, according to top. > system.time(CrossValParams("Leave-k-Out", leave = 2)) user system elapsed 760.018 15.093 775.090 Strangely, if I rerun this code again, it works quickly the second time. > system.time(CrossValParams("Leave-k-Out", leave = 2)) user system elapsed 0.145 0.000 0.145 I haven't been able to come up with a minimal reproducile example of the issue. How can this be done consistently and efficiently? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Delayed Assignment to S4 Slots
Good day, I have an S4 class with some slots in my Bioconductor package. One of the slots stores the range of top variables to try during feature selection (the variables might be ranked by some score, like a t-test). The empty constructor looks like setMethod("ResubstituteParams", "missing", function() { new("ResubstituteParams", nFeatures = seq(10, 100, 10), performanceType = "balanced error") }) But, someone might have a small omics data set with only 40 features (e.g. CyTOF). Therefore, trying the top 10, 20, ..., 100 is not a good default. A good default would wait until the S4 class is accessed within cross-validation and then, based on the dimensions of the matrix or DataFrame, pick a suitable range. I looked at delayedAssign, but x is described as "a variable name (given as a quoted string in the function call)". It doesn't seem to apply to S4 slots based on my understanding of it. > r <- ResubstituteParams() > delayedAssign("r@nFeatures", nrow(measurements)) > measurements <- matrix(1:100, ncol = 10) > r@nFeatures # Still the value from empty constructor. [1] 10 20 30 40 50 60 70 80 90 100 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Windows-specific Function Not Found Error
Hello, Ah, I had a few different uses of MultiAssayExperiment::colData in a particular function of the package, but one line had only colData without the scoping in front. I wish that R error messages displayed R file names and line numbers more often. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Windows-specific Function Not Found Error
Good day, I see a checking failure for ClassifyR for Windows Server 2019 only. The error is Error: BiocParallel errors 4 remote errors, element index: 1, 4, 6, 8 6 unevaluated and other errors first remote error: could not find function "colData" Is there anything I can change in my code to help it pass? The error doesn't appear on the two other Bioconductor servers. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] VariantAnnotation Installation Compile Error
Hello, The problem stemed from an .Rprofile file which was setting .libPaths with the directory path to a library of packages for the previous version of R and starting R with the --vanilla option avoided the problem. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] VariantAnnotation Installation Compile Error
Good day, I also see NULL on R start-up. I installed R from source. $ R-4.1.0/bin/R R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" ...... Type 'q()' to quit R. NULL > I noticed a couple of error messages at the end of the installation which I thought were harmless. I will reinstall R. The extracted directory and prefix directory were the same, which might be problematic. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] VariantAnnotation Installation Compile Error
Good day, No, the temporary directory has space remaining. I wonder what file it is referring to by "No such file or directory". I had an idea to reinstall Biostrings using force = TRUE, but it didn't help. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] VariantAnnotation Installation Compile Error
Good day, I apparently have a valid Bioconductor package library but VariantAnnotation won't install successfully. > valid() [1] TRUE > install("VariantAnnotation") Bioconductor version 3.13 (BiocManager 1.30.15), R 4.1.0 (2021-05-18) Installing package(s) 'VariantAnnotation' trying URL 'https://bioconductor.org/packages/3.13/bioc/src/contrib/VariantAnnotation_1.38.0.tar.gz' Content type 'application/x-gzip' length 1726088 bytes (1.6 MB) == downloaded 1.6 MB NULL * installing *source* package ‘VariantAnnotation’ ... ** using staged installation ** libs gcc -I"/verona/biostat/software/R-4.1.0/include" -DNDEBUG NULL -D_FILE_OFFSET_BITS=64 -I'/dskh/biostat/software/R-4.1.0/library/S4Vectors/include' -I'/dskh/biostat/software/R-4.1.0/library/IRanges/include' -I'/dskh/biostat/software/R-4.1.0/library/XVector/include' -I'/dskh/biostat/software/R-4.1.0/library/Biostrings/include' -I'/dskh/biostat/software/R-4.1.0/library/Rhtslib/include' -I/usr/local/include -fpic -g -O2 -c Biostrings_stubs.c -o Biostrings_stubs.o gcc: error: NULL: No such file or directory make: *** [/verona/biostat/software/R-4.1.0/etc/Makeconf:168: Biostrings_stubs.o] Error 1 ERROR: compilation failed for package ‘VariantAnnotation’ > sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) BLAS: /dskh/biostat/software/R-4.1.0/lib/libRblas.so LAPACK: /dskh/biostat/software/R-4.1.0/lib/libRlapack.so ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] cannot reproduce the build error with InTAD package
It looks like you are creating a MultiAssayExperiment in your vignette. Numerous Bioconductor packages relying on MultiAssayExperiment infrastructure started failing a few days ago with the release of version 1.17.3, but I don't see the breaking change explained in the News file. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Recent change in MultiAssayExperiment for inferred MAE-level colData
This also happens to ClassifyR. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Update R package I developed which has been released by bioconductor
Good day, Step 1: Follow the steps at http://bioconductor.org/developers/how-to/git/push-to-github-bioc/ -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocParallel Variable Not Found
Good day, I am not sure how to fix my package properly, even with the good example. A link to the specific part of my function is https://github.com/DarioS/ClassifyR/blob/e35899caceb401691990136387a517f4c3b57d5e/R/runTests.R#L567 and the example in the help page of runTestsEasyHard function triggers the error shown in Bioconductor's daily build. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocParallel Variable Not Found
Good day, Thanks for the examples which demonstrate the issue. Do you have other recommendations if, inside the loop, another function in the package is being called and the variable being passed is the ellipsis? There are only a couple of variables which might be provided by the user collected in the ellipsis, so the functional approach might still be the best in that case. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] BiocParallel Variable Not Found
Good day, I have a loop in a function of my R package which by default uses bpparam() to set the framework used for parallelisation. On Windows, I see the error Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: object 'selParams' not found This error does not happen on the Linux or MacOS operating systems. It happens using both R 3.6 and the upcoming version 4. The error can be reproduced running the examples of runTests function in ClassifyR. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Use of SummerisedExperiments or MultiAssayExperiments of many many Dataframes/ nested List objects
Good day, You are operating with tables of statistical hypothesis test summaries, rather than input data which the tests are done with, so it doesn't make sense to use SummarizedExperiment or MultiAssayExperiment. The data is not experimental measurements. You should try DataFrame from Bioconductor package S4Vectors. It's better than a data.frame and won't flood your console with output. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Whats the timeframe for Bioc Support updates and downtime?
Good day, Could the forum have automatic saving of drafted text like some other forums? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Bioconductor 3.10 is released!!
Good day, In the development branch, all packages are only built on Linux. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] patch old releases? e.g. RELEASE_3_8
Good day, No; anything older than the release branch at present is not modifiable. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)
Good day, Should setting workers to 1 and RNGseed to a number result in a warning to the user that the seed will effectively be ignored? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] GAlignments Constructor Type Checking Error
Good day, Although the documentation states "Generally not used directly", I'm trying it. A small example fails because the input is evaluated to be in the wrong format, but it doesn't seem so when I look at the variable type of strand. debug(GAlignments) GAlignments("chr1", 1L, strand = Rle(factor('+')), cigar = "10M") debug: new("GAlignments", NAMES = names, seqnames = seqnames, start = pos, cigar = cigar, strand = strand, elementMetadata = elementMetadata, seqinfo = seqinfo) Browse[2]> strand factor-Rle of length 1 with 1 run Lengths: 1 Values : + Levels(1): + Browse[2]> n Error in validObject(.Object) : invalid class “GAlignments” object: 'strand(x)' must be an unnamed 'factor' Rle with no NAs (and with levels +, - and *) This looks like a false-positive to me. Also, it would increase readability if the constructor didn't run off the edge of the PDF page in the reference manual by using \preformatted. Also, I wonder why seqnames is automatically converted into a factor Rle, but strand isn't. Couldn't strand also use .asFactorRle? ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] GitHub Pages Vignettes
Good day, On the Bioconductor website, scde has no vignettes listed in the Documentation section. Looking at the contents of the package, there is a vignettes directory with four vignettes in it. They each have output: md_document and are hosted on GitHub.io. The vignettes also are not accessible from within R > browseVignettes("scde") No vignettes found by browseVignettes("scde") Is such a documentation design choice suitable for Bioconductor packages? The Vignettes section of Package Guidelines states that a vignette is mandatory, but there is no statement about acceptable output formats of vignettes. Also, the Package Vignettes webpage seems to have been written before HTML vignettes were possible, because it refers only to Rnw and PDF files. Its URL is http://bioconductor.org/help/package-vignettes/ Could such requirements be made explicit? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] QuasR Overwrites Base Graphics Settings
Good day, Doing a quality control plot with QuasR overwrites the user's figure margins and further plots don't work. A small, self-contained example is plot(1:10) # A plot without error library(QuasR) testFile <- system.file("extdata", "ex1.bam", package="Rsamtools") qQCReport(testFile) # Fails because figure margins too large plot(1:10) # Also fails because figure margins too large The value of par("mar") is different before and after using qQCReport. Can QuasR be changed so that it does not clobber the R session's graphics parameters? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Unable to install database package in devel version of Bioconductor 3.9
Good day, You need to provide more information to get useful guidance. What version of R did you use? From the error message, it seems that it's less than 3.5.0 but it should be R Under Development. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] package named spdeq causes error
Good day, I don't, but your software package imports agricolae which imports spdep. spdep is available from CRAN, so it's strange that the Bioconductor build server running Linux has not been able to install it. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] support the stable version of R
Good day, R's checking may encourage a dependency on R to be placed in the DESCRIPTION file, based on examining the data files distributed with the package. For ClassifyR, I get a warning if the dependency is absent. * checking data for ASCII and uncompressed saves ... WARNING Warning: package needs dependence on R (>= 2.10) -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] ClassifyR Check Error on Linux and MacOS Systems
Good day, Thanks for running it. I found the error you had happens because of an example which used random sampling and rarely returned a zero-length result. I have made the example deterministic so that it always succeeds. The error you saw is not related to the problem seen on the build servers, though. I found a browser() inside an R function which I forgot to remove before committing. After removal and committal, the error on malbec1 and merida1 is gone. It is surprising that it did not trigger an error when checking the package before committing it and that the error message observed on the build servers was not clear about what the problem was. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] ClassifyR Check Error on Linux and MacOS Systems
Good day, There is an error for ClassifyR on malbec1 and merida1 caused by a documentation example. However, it doesn't occur on tokay1. Can I get more information about which example is emitting the error on malbec1 server? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Cannot access remote upstream after changing the laptop
Good day, You could also copy the private key from the old computer to the new computer, if you still can use the old computer. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Need update access agaisnt package 'banocc'
Good day, The warning message is caused by a documentation linking inconsistency in R running on different operating systems. It may be avoided, but it's not essential. Perhaps documentation linking will be soon be consistent between operating systems. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Windows error "UCSC library operation failed" in package karyoploteR
Good day, The import of BigWig files does not work on Windows and is documented. Execute ?BigWigFile-class and notice in the Description section: "These functions do not work on Windows.". ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] EXTERNAL: Fwd: PREDA problems reported in the Multiple platform build/check report for BioC 3.7
Good day, Similar to you, I am awaiting the restoration of sparsediscrim which was removed on the same day as PREDA. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Action for Uncompressed Data Warning
Good day, I added a new data set to a package I develop and there is the warning: * checking data for ASCII and uncompressed saves ... WARNING Note: significantly better compression could be obtained by using R CMD build --resave-data old_size new_size compress asthma.RData715Kb484Kbbzip2 Should I ignore it or save it again with compression? The 231 Kb reduction in file size seems insignificant. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocParallel on Windows Never Ends
Good day, I couldn't get a working param object. It never completes the command param = bpstart(SnowParam(2, manager.hostname = "144.130.152.1", manager.port = 2559)) I obtained the IP address by typing "My IP address" into Google and it gave me the address shown. I used netstat -an and Proto Local Address Foreign AddressState TCP127.0.0.1:2559 0.0.0.0:0 LISTENING was one of the results displayed. I have reproduced this problem on another computer with Windows 10. I also tried param = bpstart(SnowParam(2, manager.hostname = "127.0.0.1", manager.port = 2559)) but it doesn't complete. I was able to identify the problem is with the line bpbackend(x) <- do.call(parallel::makeCluster, cargs) So, to summarise, > cargs $`spec` [1] 2 $type [1] "SOCK" $snowlib [1] "C:/Program Files/R/R-3.5.0/library/BiocParallel" $master [1] "127.0.0.1" $port [1] 2559 > do.call(parallel::makeCluster, cargs) # Freezes. Should I ask the question on R-devel because it doesn't appear to be specific to Bioconductor ? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] BiocParallel on Windows Never Ends
Good day, I was interested how the performance of my package is on a 32-bit Windows computer because I'm going to give a workshop about it soon and some people might bring old laptops. I found that using SnowParam with workers set to more than 1 never finishes. The minimal code to cause the issue is: bplapply(1:10, function(i) LETTERS[i], BPPARAM = SnowParam(workers = 1)) # Immediately returns a result. bplapply(1:10, function(i) LETTERS[i], BPPARAM = SnowParam(workers = 2)) # Never completes. > sessionInfo() R version 3.5.0 (2018-04-23) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocParallel_1.14.1 loaded via a namespace (and not attached): [1] compiler_3.5.0 snow_0.4-2 parallel_3.5.0 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Unexpected Warning About Cross-Reference Without Package Specification
Good day, Thanks. I'll use the [limma] specifier to avoid the Warning from the build system. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Unexpected Warning About Cross-Reference Without Package Specification
Good day, I created a minimalist package that demonstrates the issue and it is attached to this letter. After using R CMD build, the subsequent R CMD check process emits one warning. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia Tester.tar.gz Description: Tester.tar.gz ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Unexpected Warning About Cross-Reference Without Package Specification
Good day, limma was installed using biocLite, so it would be built before R CMD check was run. I could summarise all of the relevant information and send to R-package-devel mailing list to check if it is a bug. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Unexpected Warning About Cross-Reference Without Package Specification
Good day, Indeed, it is in the Suggests component of the dependency specification. I didn't find any extra requirements for this case in the Cross-references section of Writing R Extensions, so I'm unsure of where to read about the rule. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocInstaller: next generation
Good day, The features of the proposed package seem a lot like BiocInstaller. Once I have upgraded R and have the newest BiocInstaller installed using the bootstrapping technique of source("https://bioconductor.org/biocLite.R;), I typically do library(BiocInstaller) biocLite("GenomicAlignments") to install the GenomicAlignments package in a subsequent R session, for instance. This avoids repetitive sourcing of the biocLite script from the Bioconductor server. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Unexpected Warning About Cross-Reference Without Package Specification
Good day, I have documented a parameter that is linked to lmFit's documentation. \item{...}{Optional settings that are passed to \code{\link{lmFit}}.} The package checking process displays a warning. * checking Rd cross-references ... WARNING Missing link or links in documentation object 'limmaSelection.Rd': ‘lmFit’ If I add [limma] to the cross-reference, the link is resolved. * checking Rd cross-references ... OK Why is the package specification not optional in this scenario? I am using the latest release of R. * using R version 3.5.0 (2018-04-23) * using platform: x86_64-pc-linux-gnu (64-bit) * using session charset: UTF-8 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] mcols Function Not Found for Windows Build
Good day, I notice an error happening when the vignette of ClassifyR is checked by tokay2. mcols is not found. I viewed the check reports of S4Vectors, and there are some Warnings for all operating systems, but no platform has Error, so it's unlikely to be related to the problem. Is there a way to make ClassifyR guard against this problem in Windows? I don't know how to begin solving this issue. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocCheck - warning: files are over 5MB
Good day, You could make use of the package named BSgenome.Celegans.UCSC.ce11. It contains the DNA sequences of all of the chromosomes of the roundworm and doesn't add any size to your package. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Numeric Operation on DataFrame
Good day, Would it be useful to provide the same operations which can be done to a data.frame for a DataFrame in a future release of S4Vectors? For example, dataTable <- data.frame(aFeature = 1:5, anotherFeature = 5:1) colMeans(dataTable) # aFeature anotherFeature # 3 3 dataTableS4 <- DataFrame(aFeature = 1:5, anotherFeature = 5:1) colMeans(dataTableS4) Error in colMeans(dataTableS4) : 'x' must be an array of at least two dimensions -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Emails to package maintainer bouncing back - what to do?
Good day, Although the maintainer is unreachable, the original developer, Gábor Csárdi, is an active member of the R programming community. You should write to him. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Pandoc on Build Computers
Good day, In the vignette of ClassifyR, I have institute: The University of Sydney, Australia at the top. institute has been a valid entry since version 1.17 of Pandoc released in 2016. Could Pandoc on the Bioconductor computers be updated? I notice that version 2.0.2 is available since earlier this week. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] exonsBy dropping genes from TxDb
Good day, I stepped through the code until execution reached the end of postForm in RCurl which is called by getBM and obtains the textual result from the server. If I check the contents of write$value(), the example missing transcript is not there. Browse[3]> grep("ENST0485971", write$value()) integer(0) write$value is a weird function. It's prototype is function (collapse = "", ...) but its body contains code such as if (is.null(collapse)) return(txt) I wonder where txt is created. It's not passed as an extra variable. Browse[7]> print(list(...)) list() Searching the R code reveals that txt is created as a global variable in another function named dynCurlReader by the code statement txt <<- character(). RCurl also uses functions that don't begin with a dot but are undocumented. ans = encode(ans) Browse[7]> ?encode No documentation for ‘encode’ in specified packages and libraries Anyway, the transcript ID is also missing from txt. Browse[7]> grep("ENST0485971", txt) integer(0) It's hard to know what the obfuscated code of RCurl is doing. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] how can I contribute to the success of great packages?
Good day, Thanks for the clarification. I appreciate your regular insights on the support forum over the years. It seems that Gviz will be stable enough to use, although the same maintainer's domainsignatures package has strikethrough across its name in the 3.6 build report, indicating its deprecation from Bioconductor. domainsignatures has no NEWS file explaining why it is being deprecated, so the deprecation seems unplanned and unintentional, so end-users of it would have no advance notice if it later became defunct until they were faced with failed biocLite installation command. I simply wish to avoid that situation with genomic plotting. Indeed, I wouldn't be as cautious if I was considering csaw, for example, and noticed build system warnings close to the deadline. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Gviz Abandonware
Hello, Gviz hasn't been updated for the past two months but has a CHECK warning and there are almost no answered questions on the support website in the past three months. Is it worthwhile developing plotting functions based on Gviz if it is likely to become defunct next year? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Why should Bioconductor developers re-use core classes?
Good day, It might be useful to readers to have a comparison table (ticks and crosses) in the MultiAssayExperiment vignette that compares the features available in it to those available in SummarizedExperiment, to allow quicker decision making. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] ShortRead readFasta UniProt Incorrect Import
Good day, If I have a FASTA file that contains >sp|Q9NYW0|T2R10_HUMAN Taste receptor type 2 member 10 OS=Homo sapiens >GN=TAS2R10 PE=1 SV=3 MLRVVEGIFIFVVVSESVFGVLGNGFIGLVNCIDCAKNKLSTIGFILTGLAISRIFLIWI IITDGFIQIFSPNIYASGNLIEYISYFWVIGNQSSMWFATSLSIFYFLKIANFSNYIFLW LKSRTNMVLPFMIVFLLISSLLNFAYIAKILNDYKTKNDTVWDLNMYKSEYFIKQILLNL GVIFFFTLSLITCIFLIISLWRHNRQMQSNVTGLRDSNTEAHVKAMKVLISFIILFILYF IGMAIEISCFTVRENKLLLMFGMTTTAIYPWGHSFILILGNSKLKQASLRVLQQLKCCEK RKNLRVT readFasta fails to import it with the warning proteins <- readFasta('.', "test.fasta") Warning message: In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, : reading FASTA file test.fasta: ignored 129 invalid one-letter sequence codes Also, the amino acid sequence is incomplete. There are 308 amino acids, but > width(proteins) [1] 178 It's undesirable for users that some amino acids are discarded. Hopefully, they notice the warning message before proceeding with the analysis. Admittedly, readFasta is in ShortRead, so is designed to work with high througput sequencing reads. But, perhaps it would be better suited to a infrastructure package such as Biobase and generalised to correctly import any FASTA file. There's even a Bioconductor workflow at https://www.bioconductor.org/help/workflows/sequencing/ which has a section titled "DNA/amino acid sequence from FASTA files" and demonstrates the use of readFasta. I used version 1.34.2 of ShortRead which is the newest one. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Why should Bioconductor developers re-use core classes?
Good day, I developed ClassifyR, which is a classification framework, based on ExpressionSet. Now that we're getting enquiries about inputting multiple datasets derived from the same patients, we plan to completely refactor the software to use MultiAssayExperiment as a foundation class. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] EXTERNAL: Fetuching Upstream Permission Denied
Good day, Thanks for your help. In the end, export GIT_SSH_COMMAND='ssh -i ~/SSHkeys/digiOcean' did the trick. The write access is showing. R Wpackages/ClassifyR -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Fetuching Upstream Permission Denied
Good day, I have submitted my public key a couple of months ago and am now trying to do some maintenance. The code I used is: git clone https://github.com/DarioS/ClassifyR.git cd ClassifyR git remote add upstream g...@git.bioconductor.org:packages/ClassifyR.git git config core.sshCommand "ssh -i ~/SSHkeys/digiOcean" git checkout master git fetch upstream but I get an error. Permission denied (publickey). fatal: Could not read from remote repository. The key has the appropriate permissions. $ ls -l ~/SSHkeys/digiOcean -rw--- 1 dario dario 1675 Aug 5 2015 /home/dario/SSHkeys/digiOcean Copying the private key to ~/.ssh/ does not help. How can I do it? ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] MultiAssayExperiment Subsetting Fails if Column Data Has One Column
Good day, Subsetting a MultiAssayExperiment object fails if the column data has one column but not 2 or more columns. Perhaps drop = FALSE is missing for the DataFrame subsetting. A minimal example is: rowColNames <- list(paste0("Gene", 1:10), paste0("Person", 1:10)) aTable <- matrix(rnorm(100), ncol = 10, dimnames = rowColNames) classes <- data.frame(row.names = paste0("Person", 1:10), class = rep(c("Non-Responder", "Recovery"), each = 5)) measurementsSet <- MultiAssayExperiment(list(RNA = aTable), classes) measurementsSet[1, 1, ] other attached packages: [1] S4Vectors_0.15.7BiocGenerics_0.23.1 MultiAssayExperiment_1.3.34 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] ExperimentList Contructor Failing
Good day, Whatever the problem is, it's gone with R Under Development and all packages installed from the development branch of Bioconductor. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] ExperimentList Contructor Failing
Good day, Although the package seems to build without errors, I can't run the basic examples of MultiAssayExperiment successfully. library(MultiAssayExperiment) > example("ExperimentList") ExprmL> ## Create an empty ExperimentList instance ExprmL> ExperimentList() Error in checkSlotAssignment(object, name, value) : assignment of an object of class “NULL” is not valid for slot ‘elementMetadata’ in an object of class “ExperimentList”; is(value, "DataTableORNULL") is not TRUE Everything seems fine with the package check: > BiocInstaller::biocValid() * sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 [4] LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MultiAssayExperiment_1.2.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.12 BiocInstaller_1.26.1 compiler_3.4.1 [4] GenomeInfoDb_1.12.2plyr_1.8.4 XVector_0.16.0 [7] bitops_1.0-6 tools_3.4.1zlibbioc_1.22.0 [10] digest_0.6.12 tibble_1.3.4 gtable_0.2.0 [13] lattice_0.20-35rlang_0.1.2Matrix_1.2-11 [16] DelayedArray_0.2.7 shiny_1.0.5parallel_3.4.1 [19] GenomeInfoDbData_0.99.0gridExtra_2.3 stringr_1.2.0 [22] UpSetR_1.3.3 S4Vectors_0.14.4 IRanges_2.10.3 [25] stats4_3.4.1 grid_3.4.1 shinydashboard_0.6.1 [28] glue_1.1.1 Biobase_2.36.2 R6_2.2.2 [31] purrr_0.2.3tidyr_0.7.1magrittr_1.5 [34] reshape2_1.4.2 ggplot2_2.2.1 scales_0.5.0 [37] matrixStats_0.52.2 htmltools_0.3.6BiocGenerics_0.22.0 [40] GenomicRanges_1.28.5 SummarizedExperiment_1.6.3 mime_0.5 [43] xtable_1.8-2 colorspace_1.3-2 httpuv_1.3.5 [46] stringi_1.1.5 RCurl_1.95-4.8 lazyeval_0.2.0 [49] munsell_0.4.3 * Out-of-date packages Package LibPath Installed Built ReposVer rJava "rJava" "/usr/local/lib/R/site-library" "0.9-8" "3.2.3" "0.9-8" Repository rJava "https://cran.rstudio.com/src/contrib; update with biocLite() Error: 1 package(s) out of date The same example works on another computer using Windows operating system. What's the issue with this Linux environment? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] how to verify upstream git changes
Good day, I like the idea of a commits log on the Bioconductor website. It was useful being able to see at a glance which packages have recently been changing. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] SnpSet Creation Function Prototype Not Valid R Code
Good day, It's formatted in monospace font, implying that it's R code, but new('SnpSet', phenoData = [AnnotatedDataFrame], experimentData = [MIAME], annotation = [character], protocolData = [AnnotatedDataFrame], call = [matrix], callProbability = [matrix], ...) is just pseudocode. Also, object creation using new is discouraged. Perhaps SnpSet could have a proper constructor, like ExpressionSet does? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] git and public keys
Good day, Is the private key in a location other than the default SSH key folders? If so, use the ssh-add command to have the SSH agent know about it. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Generate valid SSH keys for the bioc-git server!
Good day, I filled out the form on Thursday, but can't fetch the repository. $ git fetch upstream Permission denied (publickey). fatal: Could not read from remote repository. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Default Coverage Value
Hello, The coverage function is still inconvenient to use with a vector of weights to convert a GRanges metadata column into a RleList object. "The coverage method for GRanges could gain a default value argument." - Michael Lawrence, January 2013. "Something like coverage(foo, bar, ..., NA.value=-1)?" - Tim Triche, Jr., January 2013. Might this plan be restored (with a default value of 0 for backwards compatibility)? ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] SplicingGraphs Feature Suggestion
Hello, It would be convenient if the colour or the width of the edges could be customised to represent whether an edge is equally present in two experimental conditions or the degree to which it is enriched in one of them of an RNA-seq dataset. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] pairwiseAlignment Improvements
Good day, The location of indels can be retrieved from a PairwiseAlignmentsSingleSubject object by using indel. Determining any difference between the two sequences, including substitutions, is not quick nor easy. I suppose that summary displays details of the mismatches, but the variable is of class PairwiseAlignmentsSingleSubjectSummary which has no documented accessors. So, the code to access the information looks bad. summaryAlign@mismatchSummary[["subject"]] SubjectPosition Subject Pattern Count Probability 1 2 T A 1 1 2 3 T A 1 1 This could be improved with accessors for end users. Also, instead of being a data.frame, this would be better stored as IRanges with associated metadata columns, accessible with mcols, so that methods like reduce could easily be used to look for contiguous blocks of differences. Is there a reason why the show method for the summary only shows mismatches, even if there are indels contained in it? This seems arbitrary and also misleading, because it always gives a false impression that there are no indels. Could the return data types consistently be made to be IRanges ? Sometimes it's IntegerList, sometimes it's IRanges. For example, > A 11-letter "DNAString" instance seq: GAACGAGGACC > B 8-letter "DNAString" instance seq: GGACGAGC > alignment <- pairwiseAlignment(A, B, gapOpening = 0, gapExtension = 1, > substitutionMatrix = substitutions) > alignment@subject@mismatch IntegerList of length 1 [[1]] 2 > alignment@subject@indel IRangesList of length 1 [[1]] IRanges of length 1 start end width [1] 8 9 2 Lastly, why are functions like insertion, deletion, and indel documented in Numeric Summary Methods? Unlike nchar and score, they are not numerical summaries of the data. It'd be nice to see this part of Biostrings thoroughly refactored with more focus on UX. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] ShortReadQ Serialisation Slow and Creates Large File
Good day, Accidentally using save instead of writeFastq lead me to noticing how large a ShortReadQ object on disk is. A small set of reads > RNAreads class: ShortReadQ length: 42680 reads; width: 50..100 cycles was saved two ways. As a text file, they take 11 MB uncompressed. But, when saved in binary format, the size on disk is 2.0 GB. Is a lot of unnecessary detail saved when the object is serialised? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] GRangesList Conversion Fails For Unstranded Sequencing Data
Good day, countOverlaps doesn't work for a GAlignmentPairs object with strandMode set to 0. This is because of an oversight in the grglist function. It has an if statement that checks whether the strand mode is 1 or 2. Then, it tries to subset the variable 'x_unlisted'. However, if strand mode is 0, neither of the conditional sections of code are executed and Error in .local(x, use.names, use.mcols, ...) : object 'x_unlisted' not found happens because the 'x_unlisted' variable has not been created. It's a surprise no one else has encountered this bug before. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] edgeR and limma Default Offsets
Good day, Now I notice the differences in how the prior counts are applied. In edgeR's cpm: prior.count.scaled <- lib.size/mean(lib.size) * prior.count lib.size <- lib.size + 2 * prior.count.scaled ...... t(x) + prior.count.scaled but in limma's voom: t(counts + 0.5)/(lib.size + 1) Basically, the values added to the counts and the library size ignore the library size of each sample in the voom function. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] edgeR and limma Default Offsets
Good day, The cpm function in edgeR uses a default offset of 0.25 and voom in limma uses 0.5 (and provides no user modification) to calculate the base 2 logarithm of the counts per million. Might these be made consistent? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Alternative Hypothesis Specification For edgeR
Good day, In a future release, could the user be allowed to specify an alternative hypothesis such as the coefficient being positive? DESeq2 provides an altHypothesis parameter for such a purpose. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] readGAlignments Lacks strandMode
Good day, Now I know about invertStrand, I agree that it's best to keep the strandMode only for paired-end data. Indeed, it's an example at the end of the lengthy documentation of GAlignments. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] GAlignments Sorting Causes C Stack Error
Good day, When sort is used on a GAlignments object, a stack error is shown, no matter how small the object is. > testAlignments GAlignments object with 3 alignments and 0 metadata columns: seqnames strand cigarqwidth start end width njunc 700666F:126:C8768ANXX:3:2204:3175:99484chr14 + 71S27M98 18386040 1838606627 0 700666F:126:C8768ANXX:1:1107:8115:31928chr14 + 40S60M 100 18915005 1891506460 0 700666F:126:C8768ANXX:1:2206:7564:34686chr14 + 40S50M90 18915005 1891505450 0 --- seqinfo: 23 sequences from an unspecified genome > sort(testAlignments) Error: C stack usage 7970544 is too close to the limit I use up-to-date packages. > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 [6] LC_MESSAGES=C.UTF-8LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicAlignments_1.10.0 Rsamtools_1.26.1 Biostrings_2.42.1 XVector_0.14.0 [5] SummarizedExperiment_1.4.0 Biobase_2.34.0 GenomicRanges_1.26.2 GenomeInfoDb_1.10.1 [9] IRanges_2.8.1 S4Vectors_0.12.1 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] lattice_0.20-34bitops_1.0-6 grid_3.3.2 zlibbioc_1.20.0 Matrix_1.2-7.1 BiocParallel_1.8.1 [7] tools_3.3.2 ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] readGAlignments Lacks strandMode
Good day, readGAlignmentPairs has strandMode but readGAlignments doesn't, which means that single-end strand-specific RNA-seq that generates sequences on the opposite strand to the gene needs a subsequent ifelse statement. The API could be more consistent by providing a strandMode option for readGAlignments and other similar functions in GenomicAlignments. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] readGAlignmentPairs Fails if Used Inside mclapply Loop
Hello, I fixed the suggested test case, since the command didn't specify the connection and produced an error. It appears to work without a problem. > bamFile <- mappedReadsGenome[2] > length(serialize(readGAlignmentPairs(bamFile, strandMode=2), NULL)) [1] 1329295005 I also tried mc.cores = 2 and it also resulted in an error. Each of the files has 30 to 40 million mappings, so I wouldn't expect them to be too big. I'll stick to bplapply. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] readGAlignmentPairs Fails if Used Inside mclapply Loop
Good day, I found that readGAlignmentPairs fails when used inside an mclapply loop but not an sapply loop. I haven't had such problems with other functions when using mclapply. > class(mappedToGenomeFiles) [1] "character" > length(mappedToGenomeFiles) [1] 13 > mappedReadsGenome <- sapply(mappedToGenomeFiles, function(bamFile) { readGAlignmentPairs(bamFile, strandMode = 2) }) # No error. Each item is of GAlignmentPairs class. But, with mclapply: > mappedReadsGenome <- mclapply(mappedToGenomeFiles, function(bamFile) { readGAlignmentPairs(bamFile, strandMode = 2) }, mc.cores = 7) Warning message: In mclapply(mappedToGenomeFiles, function(bamFile) { : scheduled cores 6, 5, 3, 1, 4, 2 encountered errors in user code, all values of the jobs will be affected > mappedReadsGenome [[1]] [1] "fatal error in wrapper code" attr(,"class") [1] "try-error" [[2]] [1] "fatal error in wrapper code" attr(,"class") [1] "try-error" . . . [[7]] GAlignmentPairs object with 41860576 pairs, strandMode=2, and 0 metadata columns: seqnames strand : ranges -- ranges :-- [1]chr14 + : [19010525, 19010623] -- [19010414, 19010513] [2]chr14 + : [19010543, 19010612] -- [19010505, 19010604] [3]chr14 + : [19010608, 19010707] -- [19010577, 19010676] [4]chr14 + : [19011187, 19011286] -- [19011142, 19011241] [5]chr14 + : [19011318, 19011415] -- [19011187, 19011286] ... ...... ...... ... ... [41860572] chr4 + : [190972787, 190972886] -- [190972685, 190972784] [41860573] chr4 - : [190974302, 190974385] -- [190974302, 190974385] [41860574] chr4 - : [190978480, 190978579] -- [190978542, 190978641] [41860575] chr4 - : [190982116, 190982215] -- [190982125, 190982224] [41860576] chr4 + : [191031678, 191031776] -- [191031630, 191031729] --- seqinfo: 25 sequences from an unspecified genome . . . [[13]] [1] "fatal error in wrapper code" attr(,"class") [1] "try-error" Interestingly, reading in from one of the thirteen file paths worked. In contrast, a simple test case of the same length works: X=1:13 mclapply(X, function(x) x + 1, mc.cores = 7) # Prints 2:14. The BAM file import also works with blapply and BPPARAM = MulticoreParam(workers = 7) > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicAlignments_1.10.0 SummarizedExperiment_1.4.0 GenomicFeatures_1.26.0 AnnotationDbi_1.36.0 Biobase_2.34.0 [6] Rsamtools_1.26.1 Biostrings_2.42.0 XVector_0.14.0 GenomicRanges_1.26.1 GenomeInfoDb_1.10.1 [11] IRanges_2.8.1 S4Vectors_0.12.0 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] zlibbioc_1.20.0BiocParallel_1.8.1 lattice_0.20-34tools_3.3.2 grid_3.3.2 DBI_0.5-1 Matrix_1.2-7.1 [8] rtracklayer_1.34.1 bitops_1.0-6 RCurl_1.95-4.8 biomaRt_2.30.0 RSQLite_1.0.0 XML_3.98-1.5 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Implementation of vmatchPattern Indels
Good day, I notice that with.indels has been a parameter to vmatchPattern for almost a decade but is only a stub. I am hoping that this suggestion could put it into future development plans so the underlying functionality could be implemented soon. It could be a useful option for preprocessing of CRISPR genomic screens without leaving the R analysis environment, which is a new use case not existing before. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] vmatchPattern Returns Out of Bounds Indices
Good day, > These questions really belong to the support site. I suppose, although it seemed like an unexpected issue at first because it's not documented within ?lowlevel-matching so users don't know what to expect. > You'll get that behaviour by allowing indels. This reveals a discrepancy between the documentation and the way the function operates. In the documentation, the function definition of vmatchPattern has with.indels = FALSE in it. However, changing it to TRUE results in Error in .XStringSet.vmatchPattern(pattern, subject, max.mismatch, min.mismatch, : vmatchPattern() does not support indels yet This is utilising Biostrings 2.42.0 in R 3.3.1. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] vmatchPattern Returns Out of Bounds Indices
Hello, If using vmatchPattern to find a sequence in another sequence, the resulting end index can be beyond the length of the subject XStringSet. For example: forwardPrimer <- "TCTTGTGGAAAGGACGAAACACCG" > range(width(reads)) [1] 75 75 primerEnds <- vmatchPattern(forwardPrimer, reads, max.mismatch = 1) > range(unlist(endIndex(primerEnds)) [1] 23 76 This causes problems if using extractAt to obtain the sequences within each read. For example: sequences = extractAt(reads, locations) Error in .normarg_at2(at, x) : some ranges in 'at' are off-limits with respect to their corresponding sequence in 'x' It's rare, but still a problem, nonetheless. > table(unlist(endIndex(primerLocations)) > 75) FALSE TRUE 366225 2 This happens with Biostrings 2.42.0. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Feasibility of Parallel Extraction of Matches with extractAllMatches
Good day, I'd like to request that extractAllMatches works when subject is an XStringSet. The function could check that subject and mindex have the same length and then process them in parallel. Currently, the following example isn't immediately possible. words <- BStringSet(c("xxGOATzz", "xxMOATzz", "xxNOTEzz")) matches <- vmatchPattern("GOAT", words, max.mismatch = 1) similarWords <- extractAllMatches(words, matches) # Not possible. Could that be implemented for the next release of Biostrings? Or, perhaps it can be deprecated since it duplicates the functionality of substr? > substr(words, start(matches), end(matches)) [1] "GOAT" "MOAT" NA Also, the expected subsetting fails for MIndex objects. > class(matches) [1] "ByPos_MIndex" > length(matches) [1] 3 > length(matches[1]) [1] 3 -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Subversion Log Stalled
Good day, The log at http://bioconductor.org/developers/svnlog/ stopped updating two weeks ago. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Bioc-devel Digest, Vol 152, Issue 10
Good day, The problem is simply caused by an incorrectly typed file path. The error message isn't clear about this and describes some temporary file name (based on today's date and time) which is confusing. Perhaps the importFusionData function could be made more robust by checking for the file's existence at the beginning of the function. For example, if(file.exists(filename)) # Do fusion file import. else stop("Could not find the specified fusion file.") ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] chimera Attempts to Open Non-existent File
Good day, The examples section of importFusionData is almost entirely commented out, so it's unclear whether it works. Since a lot of the package code is never run by R CMD check and the test coverage is 0%, it's plausibly a package development issue. - Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] chimera Attempts to Open Non-existent File
Good day, The importFusionData function fails trying to open a file that doesn't exist. > fd = importFusionData("star", > "/verona/nobackup/biostat/datasets/melanoma/AAHChimeric.out.junction") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'Thu_Nov_3_13-46-27_2016': No such file or directory The section of code where the error occurs seems to be in the .starImport function. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] FlipFlop Ignores Read Strand and Requires Antiquated File Formats
Hello, The package FlipFlop is made for isoform quantitiation. Why are there no options to specify the RNA-seq read strand ? Otherwise, the method produces incorrect counts where overlapping genes on both strands are being transcribed. Also, the software requires a SAM file as input. This is inefficient, since most mapping results are stored as BAM files. It would be better if FlipFlop made more use of the import and export functions available in Rsamtools. Also, requiring the gene database to be in BED12 format creates more unnecessary work for the user. ENSEMBL and GENCODE both provide GTF and GFF3 files, which can easily be imported into R with functions provided by rtracklayer. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] ignoreSelf Option for findOverlaps of GenomicRanges Query
Good day, For an IRanges object, findOverlaps has ignoreSelf and ignoreRedundant options. However, these aren't available for a GenomicRanges input object, even though the subject parameter is optional and a query GRanges object can be overlapped with itself. Could this be changed to be consistent? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] NEWS files
Good day, I see it, too. There's no problem. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] NEWS files
Good day, ClassifyR has a NEWS file, but I don't see any link to it on ClassifyR's webpage. There are no warning or errors during the checking process. What is causing it to be missed ? I also tried news(package = "ClassifyR") and it renders well, although the R logo is gigantic. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] VCF Intersection Using readVcf Remarkably Slow
Good day, file <- system.file("extdata", "chr22.vcf.gz", package = "VariantAnnotation") anotherFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz", package = "VariantAnnotation") aSet <- readVcf(file, "hg19") system.time(commonMutations <- readVcf(anotherFile, "hg19", rowRanges(aSet))) user system elapsed 209.120 16.628 226.083 Reading in the Exome chromosome 22 VCF and intersecting it with the other file in the data directory takes almost 4 minutes. However, reading in the whole file is much faster. > system.time(anotherSet <- readVcf(anotherFile, "hg19")) user system elapsed 0.376 0.016 0.392 and doing the intersection manually takes a fraction of a second > system.time(fastCommonMutations <- intersect(rowRanges(aSet), > rowRanges(anotherSet))) user system elapsed 0.128 0.000 0.129 This comparison ignores the finer details such as the identities of the alleles, but does it have to be so slow ? My real use case is intersecting dozens of VCF files of cancer samples with the ExAC consortium's VCF file that is 4 GB in size when compressed. I can't imagine how long that would take. Can the code of readVcf be optimised ? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Warning when Reading Example VCF
Good day, When importing a VCF file from VariantAnnotation's data directory into R, a warning is emitted. library(VariantAnnotation) aFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz", package = "VariantAnnotation") aSet <- readVcf(aFile, "hg19") Warning message: In .bcfHeaderAsSimpleList(header) : duplicate keys in header will be forced to unique rownames Is there some problem with one of the VCF file's format which is distributed with VariantAnnotation ? I wouldn't expect any package data files to emit warnings to the end user. R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 15.10 VariantAnnotation 1.18.7 ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] Why sleuth is not in Bioconductor?
Hello, That contradicts the instructions on the sleuth Download page. It contains the R commands biocLite("rhdf5") and devtools::install_github("pachterlab/sleuth") You may have read the instructions too quickly and mixed the arguments up. Bioconductor requires lots of function documentation with runnable examples and a vignette. Sleuth isn't currently at the R package quality level necessary for Bioconductor. ------ Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] String Matching in Parallel
Hello, Functions such as vmatchPattern and vmatchPDict naturally lend themselves to being parallelised. Could they be enhanced to accept a BiocParallelParam object ? Or, is there no significant performance difference using them as-is and having the bplapply loop surrounding them and repeatedly calling DNAString (it's odd that vmatchPattern - for searching BSgenome objects - requires a DNAString for the pattern, rather than a DNAStringSet) or DNAStringSet ? -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BStringSet Documentation
Hello, Actually, I thought that substr unintentionally worked and perhaps they should both produce an error message. Thanks for adding the functionality for strsplit, though! -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] BStringSet Documentation
Good day, According to the documentation, I wouldn't think that substr or strsplit would work on a BStringSet, but substr does. > IDs A BStringSet instance of length 5 width seq [1]61 D00626:168:C9CWMANXX:1:1105:1816:1998 1:N:0:TCCGGAGA+ATAGAGGC [2]61 D00626:168:C9CWMANXX:1:1105:2113:1989 1:N:0:TCCGGAGA+ATAGAGGC [3]61 D00626:168:C9CWMANXX:1:1105:2703:1986 1:N:0:TCCGGAGA+ATAGAGGC [4]61 D00626:168:C9CWMANXX:1:1105:3255:1979 1:N:0:TCCGGAGA+ATAGAGGC [5]61 D00626:168:C9CWMANXX:1:1105:4525:1995 1:N:0:TCCGGAGA+ATAGAGGC > substr(IDs, 1, 37) [1] "D00626:168:C9CWMANXX:1:1105:1816:1998" [2] "D00626:168:C9CWMANXX:1:1105:2113:1989" [3] "D00626:168:C9CWMANXX:1:1105:2703:1986" [4] "D00626:168:C9CWMANXX:1:1105:3255:1979" [5] "D00626:168:C9CWMANXX:1:1105:4525:1995" > strsplit(IDs, ' ') Error in strsplit(IDs, " ") : non-character argument I think that both of these functions shouldn't work or both should work, to be consistent. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] Repitools Development Version Webpage Gone
Recently, the overview webpage of the development version of Repitools has vanished. It is still listed in the build report, though. There are also some strange build errors on Linux. -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel