date:20150115

Re: [Rd] Request to speed up save()

2015-01-15 Thread Dénes Tóth

On 01/15/2015 01:45 PM, Stewart Morris wrote: Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: gzip, bzip2 or xz, the default being gzip. I wonder if it's possible to include the pbzip2

[Rd] Closing over Garbage

2015-01-15 Thread Christian Sigg

Given a large data.frame, a function trains a series of models by looping over two steps: 1. Create a model-specific subset of the complete training data 2. Train a model on the subset data The function returns a list of trained models which are later used for prediction on test data. Due to

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Michael Lawrence

If it's not documented, it should be, because Patrick did it on purpose (the output from the IntervalTree code is not sorted). We could add an argument to disable the sorting for when the extra speed is desired. But it has proven useful. On Thu, Jan 15, 2015 at 6:42 AM, Kasper Daniel Hansen

Re: [Rd] Closing over Garbage

2015-01-15 Thread luke-tierney

On Thu, 15 Jan 2015, Christian Sigg wrote: Given a large data.frame, a function trains a series of models by looping over two steps: 1. Create a model-specific subset of the complete training data 2. Train a model on the subset data The function returns a list of trained models which are

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès

Hi guys, Indeed, the Hits object returned by findOverlaps() is not fully sorted anymore. Now it's sorted by query hit *only* and not by query hit *and* subject hit. Fully sorting a big Hits object has a high cost, both in terms of time and memory footprint. The partial sorting is *much* cheaper:

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Michael Lawrence

My concern is mostly in user code not seen in Bioc svn. But perhaps the partial sorting (by query) is sufficient for many of those. On Thu, Jan 15, 2015 at 11:34 AM, Hervé Pagès hpa...@fredhutch.org wrote: Hi guys, Indeed, the Hits object returned by findOverlaps() is not fully sorted

[Rd] default min-v/nsize parameters

2015-01-15 Thread Michael Lawrence

Just wanted to start a discussion on whether R could ship with more appropriate GC parameters. Right now, loading the recommended package Matrix leads to: library(Matrix) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1076796 57.61368491 73.1 1198505 64.1 Vcells 1671329 12.8

Re: [Rd] Request to speed up save()

2015-01-15 Thread Nathan Kurz

On Thu, Jan 15, 2015 at 11:08 AM, Simon Urbanek simon.urba...@r-project.org wrote: In addition to the major points that others made: if you care about speed, don't use compression. With today's fast disks it's an order of magnitude slower to use compression: d=lapply(1:10, function(x)

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Dario Strbenac

The order of results is not important for the analysis. I have updated the test case with a new expected result. -- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia ___

[Bioc-devel] ClassifyR Fails to Build on Windows

2015-01-15 Thread Dario Strbenac

Hello, The development version of ClassifyR won't build on Windows. It happens for a code section in the vignette that executes a function that has a bpmapply loop. However, I'm using the default parameters by calling bpparam(), so it should work on Windows. The code in the vignette executes

Re: [Bioc-devel] ClassifyR Fails to Build on Windows

2015-01-15 Thread Dan Tenenbaum

There is no shared memory on windows so you need to make sure you require() any necessary packages on each node. Dan On January 15, 2015 5:00:22 PM PST, Dario Strbenac dstr7...@uni.sydney.edu.au wrote: Hello, The development version of ClassifyR won't build on Windows. It happens for a code

[Rd] Request to speed up save()

2015-01-15 Thread Stewart Morris

Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: gzip, bzip2 or xz, the default being gzip. I wonder if it's possible to include the pbzip2 (http://compression.ca/pbzip2/) algorithm as an option when

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

2015-01-15 Thread Hervé Pagès

Hi Michael, On 01/15/2015 11:59 AM, Michael Lawrence wrote: My concern is mostly in user code not seen in Bioc svn. I understand but the fate of that code is to get out of sync sooner or later. And sooner rather than later if it relies on undocumented behavior. But perhaps the partial

Re: [Rd] Request to speed up save()

[Rd] Closing over Garbage

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

Re: [Rd] Closing over Garbage

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

[Rd] default min-v/nsize parameters

Re: [Rd] Request to speed up save()

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

[Bioc-devel] ClassifyR Fails to Build on Windows

Re: [Bioc-devel] ClassifyR Fails to Build on Windows

[Rd] Request to speed up save()

Re: [Bioc-devel] IRanges findOverlaps Result Different for Recent Update

13 matches

Site Navigation

Mail list logo

Footer information