from:"Lulu Chen"

[Bioc-devel] Change package name

2019-09-10 Thread Lulu Chen

Dear all,

Could the package name be changed after released?

I have a package which has been in Bioconductor for more than one year. We
want to publish a paper about it recently. My advisor wants to use a new
name which can better reflect the objective of the package.

Thanks for any help!
Best,
Lulu

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] support the stable version of R

2019-01-14 Thread Lulu Chen

Dear all,

When submitting package to bioconductor, it is required to change R version
in "Depends" to be >= the develop version (3.6) . As my package is also
available in GitHub, someone asks if it be possible to make it available
with the stable version of R (R3.5). In fact, my package can work well with
R3.5 if I change "Depends" back to R(>=3.5) .

So I hope to support R3.5 for the moment before next release. Should I
create another repository? Can I use a branch to support R3.5?

Thanks,
Lulu

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Memory usage for bplapply

2019-01-05 Thread Lulu Chen

Hi Martin,

Thanks for your explanation which make me understand BiocParallel
much better.

I compare memory usage in my code before packaged (using doSNOW) and after
packaged (using BiocParallel) and find the increased memory is caused by
the attached packages, especially 'SummarizedExperiment'.
As required to support common Bioconductor class, I used
importFrom(SummarizedExperiment,assay). After deleting this, the memory for
each thread save nearly 200Mb. I open a new R session and find
> pryr::mem_used()
38.5 MB
> library(SummarizedExperiment)
> pryr::mem_used()
314 MB
 (I am still using R 3.5.2, not sure any update in develop version). I
think it should be a issue. A lot of packages are importing
SummarizedExperiment just for a support and never know it can cause such a
problem.

My package still imports other packages, e.g limma, fdrtool. Checked by
pryr::mem_used() as above, only 1~2 Mb increase for each. I also check
my_package in a new session, which is around 5Mb. However,  each thread in
parallel computation still increases much larger than 5 Mb. I did a
simulation: In my old code with doSNOW, I just inserted
"require('my_package')" into foreach loop and keep other code as the same.
I used 20 cores and 1000 jobs. Each thread still increases 20~30 Mb. I
don't know if there are any other thing that cause extra cost to each
thread. Thanks!

Best,
Lulu

On Fri, Jan 4, 2019 at 2:38 PM Martin Morgan 
wrote:

> Memory use can be complicated to understand.
>
> library(BiocParallel)
>
> v <- replicate(100, rnorm(1), simplify=FALSE)
> bplapply(v, sum)
>
> by default, bplapply splits 100 jobs (each element of the list) equally
> between the number of cores available, and sends just the necessary data to
> the cores. Again by default, the jobs are sent 'en masse' to the cores, so
> if there were 10 cores (and hence 10 tasks), the first core would receive
> the first 10 jobs and 10 x 1 elements, and so on. The memory used to
> store v on the workers would be approximately the size of v, # of workers *
> jobs /per worker  * job size = 10 * 10 * 1.
>
> If memory were particularly tight, or if computation time for each job was
> highly variable, it might be advantageous to sends jobs one at a time, by
> setting the number of tasks equal to the number of jobs SnowParam(workers =
> 10, tasks = length(v)). Then the amount of memory used to store v would
> only be # of workers * 1  * 1; this is generally slower, because there
> is much more communication between the manager and the workers.
>
> m <- matrix(rnorm(100 * 1), 100, 1)
> bplapply(seq_len(nrow(m)), function(i, m) sum(m[i]), m)
>
> Here bplapply doesn't know how to send just some rows to the workers, so
> each worker gets a complete copy of m. This would be expensive.
>
> f <- function(x) sum(x)
>
> g <- function() {
> v <- replicate(100, rnorm(1), simplify=FALSE)
> bplapply(v, f)
> }
>
> this has the same memory consequences as above, the function `f()` is
> defined in the .GlobalEnv, so only the function definition (small) is sent
> to the workers.
>
> h <- function() {
> f <- function(x) sum(x)
> v <- replicate(100, rnorm(1), simplify=FALSE)
> bplapply(v, f)
> }
>
>  This is expensive. The function `f()` is defined in the body of the
> function `h()`. So the workers receive both the function f and the
> environment in which it defined. The environment includes v, so each worker
> receives a slice of v (for f() to operate on) AND an entire copy of v
> (because it is in the body of the environment where `f()` was defined. A
> similar cost would be paid in a package, if the package defined large data
> objects at load time.
>
> For more guidance, it might be helpful to provide a simplified example of
> what you did with doSNOW, and what you do with BiocParallel.
>
> Hope that helps,
>
> Martin
>
> On 1/3/19, 11:52 PM, "Bioc-devel on behalf of Lulu Chen" <
> bioc-devel-boun...@r-project.org on behalf of luluc...@vt.edu> wrote:
>
> Dear all,
>
> I met a memory issue for bplapply with SnowParam(). I need to calculate
> something from a large matrix many many times. But from the
> discussions in
> https://support.bioconductor.org/p/92587, I learned that bplapply
> copied
> the current and parent environment to each worker thread. Then means
> the
> large matrix in my package will be copied so many times. Do you have
> better
> suggestions in windows platform?
>
> Before I tried to package my code, I used doSNOW package with foreach
> %dopar%. It seems to consume less memory in each core (almost the size
> of
>

[Bioc-devel] Memory usage for bplapply

2019-01-03 Thread Lulu Chen

Dear all,

I met a memory issue for bplapply with SnowParam(). I need to calculate
something from a large matrix many many times. But from the discussions in
https://support.bioconductor.org/p/92587, I learned that bplapply copied
the current and parent environment to each worker thread. Then means the
large matrix in my package will be copied so many times. Do you have better
suggestions in windows platform?

Before I tried to package my code, I used doSNOW package with foreach
%dopar%. It seems to consume less memory in each core (almost the size of
the matrix the task needs). But bplapply seems to copy more then objects in
current environment and the above one level environment. I am very
confused.and just guess it was copying everything.

Thanks for any help!
Best,
Lulu

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)

2019-01-03 Thread Lulu Chen

Thanks for teaching me how to set a seed for each job!

On Wed, Jan 2, 2019 at 9:45 AM Martin Morgan 
wrote:

> I'll back-track on my advice a little, and say that the right way to
> enable the user to get reproducible results is to respect the setting the
> user makes outside your function. So for
>
> your = function()
>  unlist(bplapply(1:4, rnorm))
>
> The user will
>
> register(MulticoreParam(2, RNGseed=123))
> your()
>
> to always produces the identical result.
>
> Following Aaron's strategy, the R-level approach to reproducibility might
> be along the lines of
>
> - tell the user to set parallel::RNGkind("L'Ecuyer-CMRG") and set.seed()
> - In your function, generate seeds for each job
>
> n = 5; seeds <- vector("list", n)
> seeds[[1]] = .Random.seed  # FIXME fails if set.seed or random nos.
> have not been generated...
> for (i in tail(seq_len(n), -1)) seeds[[i]] = nextRNGStream(seeds[[i -
> 1]])
>
> - send these, along with the job, to the workers, setting .Random.seed on
> each worker
>
> bpmapply(function(i, seed, ...) {
> oseed <- get(".Random.seed", envir = .GlobalEnv)
> on.exit(assign(".Random.seed", oseed, envir = .GlobalEnv))
> assign(".Random.seed", seed, envir = .GlobalEnv)
> ...
> }, seq_len(n), seeds, ...)
>
> The use of L'Ecuyer-CMRG and `nextRNGStream()` means that the streams on
> each worker are independent. Using on.exit means that, even on the worker,
> the state of the random number generator is not changed by the evaluation.
> This means that even with SerialParam() the generator is well-behaved. I
> don’t know how BiocCheck responds to use of .Random.seed, which in general
> would be a bad thing to do but in this case with the use of on.exit() the
> usage seems ok.
>
> Martin
>
>
> On 12/31/18, 3:17 PM, "Lulu Chen"  wrote:
>
> Hi Martin,
>
>
> Thanks for your help. But setting different number of workers will
> generate different results:
>
>
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123)))
>  [1]  1.0654274 -1.2421454  1.0523311 -0.7744536  1.3081934
> -1.5305223  1.1525356  0.9287607 -0.4355877  1.5055436
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123)))
>  [1] -0.9685927  0.7061091  1.4890213 -0.4094454  0.8909694
> -0.8653704  1.4642711  1.2674845 -0.2220491  2.4505322
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.86537045  1.46427111
> 1.26748453 -0.48906078  0.43304237 -0.03195349
> [10]  0.14670372
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349 -1.03886641  1.57451249  0.74708204
> [10]  0.67187201
>
>
>
> Best,
> Lulu
>
>
>
> On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan 
> wrote:
>
>
> The major BiocParallel objects (SnowParam(), MulticoreParam()) and use
> of bplapply() allow fully repeatable randomizations, e.g.,
>
> > library(BiocParallel)
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
>
> The idea then would be to tell the user to register() such a param, or
> to write your function to accept an argument rngSeed along the lines of
>
> f = function(..., rngSeed = NULL) {
> if (!is.null(rngSeed)) {
> param = bpparam()  # user's preferred back-end
> oseed = bpRNGseed(param)
> on.exit(bpRNGseed(param) <- oseed)
> bpRNGseed(param) = rngSeed
> }
> bplapply(1:4, rnorm)
> }
>
> (actually, this exercise illustrates a problem with bpRNGseed<-() when
> the original seed is NULL; this will be fixed in the next day or so...)
>
> Is that sufficient for your use case?
>
> On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" <
> bioc-devel-boun...@r-project.org on behalf of
> luluc...@vt.ed

Re: [Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)

2019-01-01 Thread Lulu Chen

Hi Martin,

Thanks for your help. But setting different number of workers will generate
different results:

> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(1, RNGseed=123)))
 [1]  1.0654274 -1.2421454  1.0523311 -0.7744536  1.3081934 -1.5305223
1.1525356  0.9287607 -0.4355877  1.5055436
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(2, RNGseed=123)))
 [1] -0.9685927  0.7061091  1.4890213 -0.4094454  0.8909694 -0.8653704
1.4642711  1.2674845 -0.2220491  2.4505322
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(3, RNGseed=123)))
 [1] -0.96859273 -0.40944544  0.89096942 -0.86537045  1.46427111
1.26748453 -0.48906078  0.43304237 -0.03195349
[10]  0.14670372
> unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(4, RNGseed=123)))
 [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
-0.03195349 -1.03886641  1.57451249  0.74708204
[10]  0.67187201

Best,
Lulu

On Mon, Dec 31, 2018 at 1:12 PM Martin Morgan 
wrote:

> The major BiocParallel objects (SnowParam(), MulticoreParam()) and use of
> bplapply() allow fully repeatable randomizations, e.g.,
>
> > library(BiocParallel)
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=MulticoreParam(RNGseed=123)))
>  [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237
> -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
> > unlist(bplapply(1:4, rnorm, BPPARAM=SnowParam(RNGseed=123)))
> [1] -0.96859273 -0.40944544  0.89096942 -0.48906078  0.43304237 -0.03195349
>  [7] -1.03886641  1.57451249  0.74708204  0.67187201
>
> The idea then would be to tell the user to register() such a param, or to
> write your function to accept an argument rngSeed along the lines of
>
> f = function(..., rngSeed = NULL) {
> if (!is.null(rngSeed)) {
> param = bpparam()  # user's preferred back-end
> oseed = bpRNGseed(param)
> on.exit(bpRNGseed(param) <- oseed)
> bpRNGseed(param) = rngSeed
> }
> bplapply(1:4, rnorm)
> }
>
> (actually, this exercise illustrates a problem with bpRNGseed<-() when the
> original seed is NULL; this will be fixed in the next day or so...)
>
> Is that sufficient for your use case?
>
> On 12/31/18, 11:24 AM, "Bioc-devel on behalf of Lulu Chen" <
> bioc-devel-boun...@r-project.org on behalf of luluc...@vt.edu> wrote:
>
> Dear all,
>
> I posted the question in the Bioconductor support site (
> https://support.bioconductor.org/p/116381/) and was suggested to
> direct
> future correspondence there.
>
> I plan to generate a vector of seeds (provided by users through
> argument of
> my R function) and use them by set.seed() in each parallel computation.
> However, set.seed() will cause warning in BiocCheck().
>
> Someone suggested to re-write code using c++, which is a good idea.
> But it
> will take me much more extra time to re-write some functions from other
> packages, e.g. eBayes() in limma.
>
> Hope to get more suggestions from you. Thanks a lot!
>
> Best,
> Lulu
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] permanent link to package

2018-07-31 Thread Lulu Chen

 Dear Bioc Team,

My package is still in the develop version.  But I need to add a permanent
URL link in my paper. Which option is better:
http://bioconductor.org/packages/CAMTHC  or  10.18129/B9.bioc.CAMTHC


Thanks,
Lulu

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Change package name

[Bioc-devel] support the stable version of R

Re: [Bioc-devel] Memory usage for bplapply

[Bioc-devel] Memory usage for bplapply

Re: [Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)

Re: [Bioc-devel] how to achieve reproducibility with BiocParallel regardless of number of threads and OS (set.seed is disallowed)

[Bioc-devel] permanent link to package

7 matches

Site Navigation

Mail list logo

Footer information