On 01/15/2015 01:45 PM, Stewart Morris wrote:
Hi,
I am dealing with very large datasets and it takes a long time to save a
workspace image.
The options to save compressed data are: gzip, bzip2 or xz, the
default being gzip. I wonder if it's possible to include the pbzip2
Given a large data.frame, a function trains a series of models by looping over
two steps:
1. Create a model-specific subset of the complete training data
2. Train a model on the subset data
The function returns a list of trained models which are later used for
prediction on test data.
Due to
If it's not documented, it should be, because Patrick did it on purpose
(the output from the IntervalTree code is not sorted). We could add an
argument to disable the sorting for when the extra speed is desired. But it
has proven useful.
On Thu, Jan 15, 2015 at 6:42 AM, Kasper Daniel Hansen
On Thu, 15 Jan 2015, Christian Sigg wrote:
Given a large data.frame, a function trains a series of models by looping over
two steps:
1. Create a model-specific subset of the complete training data
2. Train a model on the subset data
The function returns a list of trained models which are
Hi guys,
Indeed, the Hits object returned by findOverlaps() is not fully
sorted anymore. Now it's sorted by query hit *only* and not by query
hit *and* subject hit. Fully sorting a big Hits object has a high
cost, both in terms of time and memory footprint. The partial
sorting is *much* cheaper:
My concern is mostly in user code not seen in Bioc svn. But perhaps the
partial sorting (by query) is sufficient for many of those.
On Thu, Jan 15, 2015 at 11:34 AM, Hervé Pagès hpa...@fredhutch.org wrote:
Hi guys,
Indeed, the Hits object returned by findOverlaps() is not fully
sorted
Just wanted to start a discussion on whether R could ship with more
appropriate GC parameters. Right now, loading the recommended package
Matrix leads to:
library(Matrix)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1076796 57.61368491 73.1 1198505 64.1
Vcells 1671329 12.8
On Thu, Jan 15, 2015 at 11:08 AM, Simon Urbanek
simon.urba...@r-project.org wrote:
In addition to the major points that others made: if you care about speed,
don't use compression. With today's fast disks it's an order of magnitude
slower to use compression:
d=lapply(1:10, function(x)
The order of results is not important for the analysis. I have updated the test
case with a new expected result.
--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
___
Hello,
The development version of ClassifyR won't build on Windows. It happens for a
code section in the vignette that executes a function that has a bpmapply loop.
However, I'm using the default parameters by calling bpparam(), so it should
work on Windows. The code in the vignette executes
There is no shared memory on windows so you need to make sure you require() any
necessary packages on each node.
Dan
On January 15, 2015 5:00:22 PM PST, Dario Strbenac dstr7...@uni.sydney.edu.au
wrote:
Hello,
The development version of ClassifyR won't build on Windows. It happens
for a code
Hi,
I am dealing with very large datasets and it takes a long time to save a
workspace image.
The options to save compressed data are: gzip, bzip2 or xz, the
default being gzip. I wonder if it's possible to include the pbzip2
(http://compression.ca/pbzip2/) algorithm as an option when
Hi Michael,
On 01/15/2015 11:59 AM, Michael Lawrence wrote:
My concern is mostly in user code not seen in Bioc svn.
I understand but the fate of that code is to get out of sync
sooner or later. And sooner rather than later if it relies on
undocumented behavior.
But perhaps the
partial
13 matches
Mail list logo