Re: [Rd] Request to speed up save()

2015-01-15 Thread Dénes Tóth
On 01/15/2015 01:45 PM, Stewart Morris wrote: Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: gzip, bzip2 or xz, the default being gzip. I wonder if it's possible to include the pbzip2

[Rd] Closing over Garbage

2015-01-15 Thread Christian Sigg
Given a large data.frame, a function trains a series of models by looping over two steps: 1. Create a model-specific subset of the complete training data 2. Train a model on the subset data The function returns a list of trained models which are later used for prediction on test data. Due to

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Michael Lawrence
If it's not documented, it should be, because Patrick did it on purpose (the output from the IntervalTree code is not sorted). We could add an argument to disable the sorting for when the extra speed is desired. But it has proven useful. On Thu, Jan 15, 2015 at 6:42 AM, Kasper Daniel Hansen

Re: [Rd] Closing over Garbage

2015-01-15 Thread luke-tierney
On Thu, 15 Jan 2015, Christian Sigg wrote: Given a large data.frame, a function trains a series of models by looping over two steps: 1. Create a model-specific subset of the complete training data 2. Train a model on the subset data The function returns a list of trained models which are

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès
Hi guys, Indeed, the Hits object returned by findOverlaps() is not fully sorted anymore. Now it's sorted by query hit *only* and not by query hit *and* subject hit. Fully sorting a big Hits object has a high cost, both in terms of time and memory footprint. The partial sorting is *much* cheaper:

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Michael Lawrence
My concern is mostly in user code not seen in Bioc svn. But perhaps the partial sorting (by query) is sufficient for many of those. On Thu, Jan 15, 2015 at 11:34 AM, Hervé Pagès hpa...@fredhutch.org wrote: Hi guys, Indeed, the Hits object returned by findOverlaps() is not fully sorted

[Rd] default min-v/nsize parameters

2015-01-15 Thread Michael Lawrence
Just wanted to start a discussion on whether R could ship with more appropriate GC parameters. Right now, loading the recommended package Matrix leads to: library(Matrix) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1076796 57.61368491 73.1 1198505 64.1 Vcells 1671329 12.8

Re: [Rd] Request to speed up save()

2015-01-15 Thread Nathan Kurz
On Thu, Jan 15, 2015 at 11:08 AM, Simon Urbanek simon.urba...@r-project.org wrote: In addition to the major points that others made: if you care about speed, don't use compression. With today's fast disks it's an order of magnitude slower to use compression: d=lapply(1:10, function(x)

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Dario Strbenac
The order of results is not important for the analysis. I have updated the test case with a new expected result. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia ___

[Bioc-devel] ClassifyR Fails to Build on Windows

2015-01-15 Thread Dario Strbenac
Hello, The development version of ClassifyR won't build on Windows. It happens for a code section in the vignette that executes a function that has a bpmapply loop. However, I'm using the default parameters by calling bpparam(), so it should work on Windows. The code in the vignette executes

Re: [Bioc-devel] ClassifyR Fails to Build on Windows

2015-01-15 Thread Dan Tenenbaum
There is no shared memory on windows so you need to make sure you require() any necessary packages on each node. Dan On January 15, 2015 5:00:22 PM PST, Dario Strbenac dstr7...@uni.sydney.edu.au wrote: Hello, The development version of ClassifyR won't build on Windows. It happens for a code

[Rd] Request to speed up save()

2015-01-15 Thread Stewart Morris
Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: gzip, bzip2 or xz, the default being gzip. I wonder if it's possible to include the pbzip2 (http://compression.ca/pbzip2/) algorithm as an option when

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès
Hi Michael, On 01/15/2015 11:59 AM, Michael Lawrence wrote: My concern is mostly in user code not seen in Bioc svn. I understand but the fate of that code is to get out of sync sooner or later. And sooner rather than later if it relies on undocumented behavior. But perhaps the partial