[Bioc-devel] VariantAnnotation: DNAStringSets for ref/alt alleles in 'VRanges' class

2013-11-04 Thread Julian Gehring

Hi,

Would it be reasonable to (optionally) allow storing the reference and 
alternative alleles in the 'VRanges' class as a 'DNAStringSet'? 
Currently, 'character' and 'Rle' are possible.  Having a 'DNAStringSet' 
would make it more consistent with the rest of the 'VariantAnnotation' 
framework and make use of the efficient 'Biostrings' string handling 
infrastructure.


Best wishes
Julian

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] VariantAnnotation: DNAStringSets for ref/alt alleles in 'VRanges' class

2013-11-04 Thread Michael Lawrence
This was a consideration. I guess I've never got much use out of them being
DNAStringSets, so I just went with the simple character vectors. It makes
sense to support DNAStringSet. I could imagine someone e.g. wanting to
represent mutations at the protein-level, and structural variants will
require more complexity, but DNA is by far the most common use case. Are
you willing to submit this as a patch?

Just out of curiosity, how are you using Biostrings in this case?


On Mon, Nov 4, 2013 at 1:12 AM, Julian Gehring julian.gehr...@embl.dewrote:

 Hi,

 Would it be reasonable to (optionally) allow storing the reference and
 alternative alleles in the 'VRanges' class as a 'DNAStringSet'? Currently,
 'character' and 'Rle' are possible.  Having a 'DNAStringSet' would make it
 more consistent with the rest of the 'VariantAnnotation' framework and make
 use of the efficient 'Biostrings' string handling infrastructure.

 Best wishes
 Julian

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

2013-11-04 Thread Ryan
Actually, the check that I proposed is only supposed to check for usage 
of user-defined variables, not variables from packages. Truthfully, 
though, I guess I'm not the right person to work on this, since in 
practice I use forked processes for the vast majority of my inside-R 
parallelization, so I never have to worry about things being undefined 
in the forked subprocess. Therefore I cant really dogfood any of the 
stuff that might be implemented as a result of this thread.


-Ryan

On Mon Nov  4 03:48:23 2013, Michael Lawrence wrote:

So what is the best practice for ensuring that something is actually
visible to the worker? If the worker needs functionality from a
package, should the namespace be explicitly referenced via ::?  Lazy
users might want to include library() calls in the worker function.
This proposed check will then throw an exception. Probably a good
thing, but is there a way for a user to declare imported namespaces?
 I know that BatchJobs allows for passing a list of packages to be
loaded via library() on the worker. That is leveraging the search path
to make sure everything is visible and is a reasonable compromise (::
is always an option). We could essentially reimplement the search path
if we wanted isolation, but the worker is already isolated. Anyway,
somehow those types of declarations should be taken into account.

Moving back to the general discussion, for complex operations, it's
easiest to have the worker in a package. In that case, the worker will
likely rely on other functions, and the cleanest way to get those
functions to the worker is to have them installed as a package. At
least with BatchJobs, when the worker is inside a package namespace,
that namespace is automatically loaded (but not attached), so all
functions are automatically visible, without any extra work by me.

Michael


On Sun, Nov 3, 2013 at 10:46 PM, Ryan r...@thompsonclan.org
mailto:r...@thompsonclan.org wrote:

Ok, here is my attempt at a function to get the list of
user-defined free variables that a function refers to:

https://gist.github.com/__DarwinAwardWinner/7298557
https://gist.github.com/DarwinAwardWinner/7298557

Is uses codetools, so it is subject to the limitations of that
package, but for simple examples, it successfully detects when a
function refers to something in the global env.


On Sun Nov  3 21:14:29 2013, Gabriel Becker wrote:

Ryan (et al),

FYI:

 f
function() {
x = rnorm(x)
x
}
 findGlobals(f)
[1] = { rnorm

x should be in the list of globals but it isn't.

~G

 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
  base

other attached packages:
[1] codetools_0.2-8



On Sun, Nov 3, 2013 at 5:37 PM, Ryan r...@thompsonclan.org
mailto:r...@thompsonclan.org
mailto:r...@thompsonclan.org mailto:r...@thompsonclan.org
wrote:

Looking at the codetools package, I think findGlobals is
basically exactly what we want here, right? As you say,
there are
necessarily limitations due to R being a dynamic language,
but the
goal is to catch common errors, not stop people from
tricking the
check.

I think I'll try to code something up soon.

-Ryan


On 11/3/13, 5:10 PM, Gabriel Becker wrote:

Henrik,

See https://github.com/duncantl/__CodeDepends
https://github.com/duncantl/CodeDepends (as used by used by
https://github.com/gmbecker/__RCacheSuite
https://github.com/gmbecker/RCacheSuite). It will identify
necessarily defined symbols (input variables) for code
that is
not doing certain tricks (eg get(), mixing data.frame
columns and
gobal variables in formulas, etc ).

Tierney's codetools package also does things along
these lines
but there are some situations where it has trouble. I
can give
more detail if desired.

~G


On Sun, Nov 3, 2013 at 3:04 PM, Ryan
r...@thompsonclan.org mailto:r...@thompsonclan.org
mailto:r...@thompsonclan.org
mailto:r...@thompsonclan.org wrote:

Another potential easy 

Re: [Bioc-devel] VariantAnnotation: DNAStringSets for ref/alt alleles in 'VRanges' class

2013-11-04 Thread Julian Gehring

Hi Michael,

Sure, I'll try to dig into it and construct a patch that adds this feature.

I stumbled upon this after converting data between the 'VCF' and 
'VRanges' class.  The primary use case I had in mind is having a more 
efficient storing and processing for short InDels, or defining variants 
by ref/alt alleles also with respect to the sequence context.


Best wishes
Julian


On 11/04/2013 12:56 PM, Michael Lawrence wrote:

This was a consideration. I guess I've never got much use out of them
being DNAStringSets, so I just went with the simple character vectors.
It makes sense to support DNAStringSet. I could imagine someone e.g.
wanting to represent mutations at the protein-level, and structural
variants will require more complexity, but DNA is by far the most common
use case. Are you willing to submit this as a patch?

Just out of curiosity, how are you using Biostrings in this case?


On Mon, Nov 4, 2013 at 1:12 AM, Julian Gehring julian.gehr...@embl.de
mailto:julian.gehr...@embl.de wrote:

Hi,

Would it be reasonable to (optionally) allow storing the reference
and alternative alleles in the 'VRanges' class as a 'DNAStringSet'?
Currently, 'character' and 'Rle' are possible.  Having a
'DNAStringSet' would make it more consistent with the rest of the
'VariantAnnotation' framework and make use of the efficient
'Biostrings' string handling infrastructure.

Best wishes
Julian

_
Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
https://stat.ethz.ch/mailman/listinfo/bioc-devel




___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

2013-11-04 Thread Gabriel Becker
Weird, I guess it needs to be logged in or something. I don't know if the
issue is that its in a non-master branch or waht. The repo is fully public
and the forCRAN_0.3.5 in  branch definitely exists on github.

I started chrome (where I'm not logged into github) and got the same 404
error but after navigating to the file by going to the repo and changing
the branch and navigating to the file, it now works even when i quit chrome
and restart it. I don't know if it needed me to do that or if there was an
intermittent problem that is now fixed.

Anyway, here is the raw code, the link for which seems to work (in a
browser where I'm not logged into github). If it still doesn't I can just
attach the file here if you want. It doesn't rely on any of the rest of the
CodeDepends machinery.

https://raw.github.com/duncantl/CodeDepends/forCRAN_0.3.5/R/librarySymbols.R


~G


On Mon, Nov 4, 2013 at 11:34 AM, Michael Lawrence lawrence.mich...@gene.com
 wrote:

 The dynamic nature of R limits the extent of these checks. But as Ryan has
 noted, a simple sanity check goes a long way. If what he has done could be
 extended to the rest of the search path (people always forget to attach
 packages), I think we've hit the 80% with 20%. Got a 404 on that URL btw.

 Michael


 On Mon, Nov 4, 2013 at 11:05 AM, Gabriel Becker gmbec...@ucdavis.eduwrote:

 Hey guys,

 Here is code that I have written which resolves library names into a full
 list of symbols:

 https://github.com/duncantl/CodeDepends/blob/forCRAN_0.3.5/R/librarySymbols.RNote
 this does not require that the packages actually be loaded at the time
 of the check, and does not load them (or rather, it loads them but does
 not
 attach them, so no searchpath muddying occurs). You do need a list of
 packages to check though (it adds the base ones automatically). It handles
 dependency and could be easily extended to handle suggests as well I
 think.

 When CodeDepends gets pushed to cran (not my call and not high on my
 priority list to push for currently) it will actually do exactly what you
 want. (the forCRAN_0.3.5 branch already does and I believe it is
 documented, so you could use devtools to install it now).

 As a side note, I'm not sure that existence of a symbol is sufficient (it
 certainly is necessary). What about situations where the symbol exists but
 is stale compared to the value in the parent? Are we sure that can never
 happen?

 ~G


 On Mon, Nov 4, 2013 at 7:29 AM, Michel Lang michell...@gmail.com wrote:

  You might want to consider using Recall() for recursion which should
 solve
  this. Determining the required variables using heuristics as codetools
 will
  probably lead to some confusion when using functions which include calls
  to, e.g., with():
 
  f = function() {
with(iris, Sepal.Length + Sepal.Width)
  }
  codetools:::findGlobals(f)
 
  I would suggest to write up some documentation on what the function's
  environment contains and how to to define variables accordingly - or
 why it
  can generally be considered a good idea to pass everything essential as
 an
  argument. Nevertheless a bpExport function would be a good addition
 for
  some rare corner cases in my opinion.
 
  Michel
 
 
  2013/11/3 Henrik Bengtsson h...@biostat.ucsf.edu
 
   Hi,
  
   in BiocParallel, is there a suggested (or planned) best standards for
   making *locally* assigned variables (e.g. functions) available to the
   applied function when it runs in a separate R process (which will be
   the most common use case)?  I understand that avoid local variables
   should be avoided and it's preferred to put as mush as possible in
   packages, but that's not always possible or very convenient.
  
   EXAMPLE:
  
   library('BiocParallel')
   library('BatchJobs')
  
   # Here I pick a recursive functions to make the problem a bit harder,
  i.e.
   # the function needs to call itself (itself = see below)
   fib - function(n=0) {
 if (n  0) stop(Invalid 'n': , n)
 if (n == 0 || n == 1) return(1)
 fib(n-2) + fib(n-1)
   }
  
   # Executing in the current R session
   cluster.functions - makeClusterFunctionsInteractive()
   bpParams - BatchJobsParam(cluster.functions=cluster.functions)
   register(bpParams)
   values - bplapply(0:9, FUN=fib)
   ## SubmitJobs |++| 100% (00:00:00)
   ## Waiting [S:0 R:0 D:10 E:0] |+++| 100% (00:00:00)
  
  
   # Executing in a separate R process, where fib() is not defined
   # (not specific to BiocParallel)
   cluster.functions - makeClusterFunctionsLocal()
   bpParams - BatchJobsParam(cluster.functions=cluster.functions)
   register(bpParams)
   values - bplapply(0:9, FUN=fib)
   ## SubmitJobs |++| 100% (00:00:00)
   ## Waiting [S:0 R:0 D:10 E:0] |+++| 100% (00:00:00)
   Error in LastError$store(results = results, is.error = !ok,
 throw.error =
   TRUE)
   :
 Errors occurred during execution. First error message:
   Error in 

Re: [Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

2013-11-04 Thread Ryan Thompson
The code that I wrote intentionally avoids checking for package variables,
since I consider that a separate problem. Package variables can be provided
to the child by leading the package, whereas user-defined variables must be
serialized in the parent and sent to the child.

I think I could fairly easily adapt the same code to return a list of all
packages that a function depends on.

-Ryan
On Nov 4, 2013 11:35 AM, Michael Lawrence lawrence.mich...@gene.com
wrote:

 The dynamic nature of R limits the extent of these checks. But as Ryan has
 noted, a simple sanity check goes a long way. If what he has done could be
 extended to the rest of the search path (people always forget to attach
 packages), I think we've hit the 80% with 20%. Got a 404 on that URL btw.

 Michael


 On Mon, Nov 4, 2013 at 11:05 AM, Gabriel Becker gmbec...@ucdavis.edu
 wrote:

  Hey guys,
 
  Here is code that I have written which resolves library names into a full
  list of symbols:
 
 
 https://github.com/duncantl/CodeDepends/blob/forCRAN_0.3.5/R/librarySymbols.RNote
  this does not require that the packages actually be loaded at the time
  of the check, and does not load them (or rather, it loads them but does
 not
  attach them, so no searchpath muddying occurs). You do need a list of
  packages to check though (it adds the base ones automatically). It
 handles
  dependency and could be easily extended to handle suggests as well I
 think.
 
  When CodeDepends gets pushed to cran (not my call and not high on my
  priority list to push for currently) it will actually do exactly what you
  want. (the forCRAN_0.3.5 branch already does and I believe it is
  documented, so you could use devtools to install it now).
 
  As a side note, I'm not sure that existence of a symbol is sufficient (it
  certainly is necessary). What about situations where the symbol exists
 but
  is stale compared to the value in the parent? Are we sure that can never
  happen?
 
  ~G
 
 
  On Mon, Nov 4, 2013 at 7:29 AM, Michel Lang michell...@gmail.com
 wrote:
 
   You might want to consider using Recall() for recursion which should
  solve
   this. Determining the required variables using heuristics as codetools
  will
   probably lead to some confusion when using functions which include
 calls
   to, e.g., with():
  
   f = function() {
 with(iris, Sepal.Length + Sepal.Width)
   }
   codetools:::findGlobals(f)
  
   I would suggest to write up some documentation on what the function's
   environment contains and how to to define variables accordingly - or
 why
  it
   can generally be considered a good idea to pass everything essential as
  an
   argument. Nevertheless a bpExport function would be a good addition
 for
   some rare corner cases in my opinion.
  
   Michel
  
  
   2013/11/3 Henrik Bengtsson h...@biostat.ucsf.edu
  
Hi,
   
in BiocParallel, is there a suggested (or planned) best standards for
making *locally* assigned variables (e.g. functions) available to the
applied function when it runs in a separate R process (which will be
the most common use case)?  I understand that avoid local variables
should be avoided and it's preferred to put as mush as possible in
packages, but that's not always possible or very convenient.
   
EXAMPLE:
   
library('BiocParallel')
library('BatchJobs')
   
# Here I pick a recursive functions to make the problem a bit harder,
   i.e.
# the function needs to call itself (itself = see below)
fib - function(n=0) {
  if (n  0) stop(Invalid 'n': , n)
  if (n == 0 || n == 1) return(1)
  fib(n-2) + fib(n-1)
}
   
# Executing in the current R session
cluster.functions - makeClusterFunctionsInteractive()
bpParams - BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values - bplapply(0:9, FUN=fib)
## SubmitJobs |++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++| 100% (00:00:00)
   
   
# Executing in a separate R process, where fib() is not defined
# (not specific to BiocParallel)
cluster.functions - makeClusterFunctionsLocal()
bpParams - BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values - bplapply(0:9, FUN=fib)
## SubmitJobs |++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++| 100% (00:00:00)
Error in LastError$store(results = results, is.error = !ok,
  throw.error =
TRUE)
:
  Errors occurred during execution. First error message:
Error in FUN(...): could not find function fib
[...]
   
   
# The following illustrates that the solution is not always
straightforward.
# (not specific to BiocParallel; must have been discussed previously)
values - bplapply(0:9, FUN=function(n, fib) {
  fib(n)
}, fib=fib)
Error in LastError$store(results = results, is.error = !ok,
throw.error = TRUE) :
  

Re: [Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

2013-11-04 Thread Gabriel Becker
Ryan,

I agree that in some sense it is a different problem, but my point is with
a different approach we can easily answer both. The code I posted returns a
named character vector of symbol names with package name being the name.

This makes it a trivial lookup to determine both a) what symbols aren't
available in any of the packages and b) what packages provide the remaining
required symbols. No extra work required.

You do have to give it a list of packages to check, but it is easy to write
a wrapper that automatically passes it all currently attached packages if
desired (a combination of search() and gsub() would be a quick and dirty
way to do this).

All that said, I'm simply trying to help. If you guys don't want to use my
code/approach that is your perogative as I'm not currently working on
BiocParallel myself.

~G




On Mon, Nov 4, 2013 at 11:54 AM, Ryan Thompson r...@thompsonclan.org wrote:

 The code that I wrote intentionally avoids checking for package variables,
 since I consider that a separate problem. Package variables can be provided
 to the child by leading the package, whereas user-defined variables must be
 serialized in the parent and sent to the child.

 I think I could fairly easily adapt the same code to return a list of all
 packages that a function depends on.

 -Ryan
 On Nov 4, 2013 11:35 AM, Michael Lawrence lawrence.mich...@gene.com
 wrote:

 The dynamic nature of R limits the extent of these checks. But as Ryan has
 noted, a simple sanity check goes a long way. If what he has done could be
 extended to the rest of the search path (people always forget to attach
 packages), I think we've hit the 80% with 20%. Got a 404 on that URL btw.

 Michael


 On Mon, Nov 4, 2013 at 11:05 AM, Gabriel Becker gmbec...@ucdavis.edu
 wrote:

  Hey guys,
 
  Here is code that I have written which resolves library names into a
 full
  list of symbols:
 
 
 https://github.com/duncantl/CodeDepends/blob/forCRAN_0.3.5/R/librarySymbols.RNote
  this does not require that the packages actually be loaded at the time
  of the check, and does not load them (or rather, it loads them but does
 not
  attach them, so no searchpath muddying occurs). You do need a list of
  packages to check though (it adds the base ones automatically). It
 handles
  dependency and could be easily extended to handle suggests as well I
 think.
 
  When CodeDepends gets pushed to cran (not my call and not high on my
  priority list to push for currently) it will actually do exactly what
 you
  want. (the forCRAN_0.3.5 branch already does and I believe it is
  documented, so you could use devtools to install it now).
 
  As a side note, I'm not sure that existence of a symbol is sufficient
 (it
  certainly is necessary). What about situations where the symbol exists
 but
  is stale compared to the value in the parent? Are we sure that can never
  happen?
 
  ~G
 
 
  On Mon, Nov 4, 2013 at 7:29 AM, Michel Lang michell...@gmail.com
 wrote:
 
   You might want to consider using Recall() for recursion which should
  solve
   this. Determining the required variables using heuristics as codetools
  will
   probably lead to some confusion when using functions which include
 calls
   to, e.g., with():
  
   f = function() {
 with(iris, Sepal.Length + Sepal.Width)
   }
   codetools:::findGlobals(f)
  
   I would suggest to write up some documentation on what the function's
   environment contains and how to to define variables accordingly - or
 why
  it
   can generally be considered a good idea to pass everything essential
 as
  an
   argument. Nevertheless a bpExport function would be a good addition
 for
   some rare corner cases in my opinion.
  
   Michel
  
  
   2013/11/3 Henrik Bengtsson h...@biostat.ucsf.edu
  
Hi,
   
in BiocParallel, is there a suggested (or planned) best standards
 for
making *locally* assigned variables (e.g. functions) available to
 the
applied function when it runs in a separate R process (which will be
the most common use case)?  I understand that avoid local variables
should be avoided and it's preferred to put as mush as possible in
packages, but that's not always possible or very convenient.
   
EXAMPLE:
   
library('BiocParallel')
library('BatchJobs')
   
# Here I pick a recursive functions to make the problem a bit
 harder,
   i.e.
# the function needs to call itself (itself = see below)
fib - function(n=0) {
  if (n  0) stop(Invalid 'n': , n)
  if (n == 0 || n == 1) return(1)
  fib(n-2) + fib(n-1)
}
   
# Executing in the current R session
cluster.functions - makeClusterFunctionsInteractive()
bpParams - BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values - bplapply(0:9, FUN=fib)
## SubmitJobs |++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++| 100% (00:00:00)
   
   
# Executing in a separate R process, where fib() is 

Re: [Bioc-devel] disappearing .tex file when running R CMD Sweave on a new vignette

2013-11-04 Thread Henrik Bengtsson
On Mon, Nov 4, 2013 at 12:46 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:


 - Original Message -
 From: Tim Triche, Jr. tim.tri...@gmail.com
 To: bioc-devel@r-project.org
 Sent: Monday, November 4, 2013 12:25:19 PM
 Subject: [Bioc-devel] disappearing .tex file when running R CMD Sweave on a  
  new vignette

 I get a bizarre error when compiling a newly-added Methylumi
 vignette:

 10 : echo keep.source term verbatim (label = sessioninfo,
 methylumi450k.Rnw:136)
 Error in driver$finish(drobj) :
   the output file 'methylumi450k.tex' has disappeared
 Calls: Anonymous - do.call - Anonymous - Anonymous
 Execution halted

 This is bizarre because 1) the file is still there, and 2) all the
 heavy
 lifting is done.

 sessionInfo(), etc. is included properly and the vignette concludes
 with
 \end{document}, but nothing I do seems to resolve this driver error.

 Any suggestions would be most appreciated.


 Probably has to do with calling setwd() in the vignette? Maybe you need an 
 on.exit() that restores the original directory. My guess is that you changed 
 directory and then R can't see the tex file because it's in a different 
 directory. See
 http://stackoverflow.com/questions/12162092/r-sweave-output-error

tools::buildVignettes(), which is used by 'R CMD build', tries to
protect against this by always resetting the working directory after
weave:ing and tangle:ing a vignette, cf.
http://svn.r-project.org/R/trunk/src/library/tools/R/Vignettes.R.
tools::buildVignette() [no plural 's'], which is used by 'R CMD
Sweave' (which is what Tim uses), should also do this, but looking at
the code, this may only work properly if argument 'dir' is an absolute
path, which it may not be the case (not sure).  I've just submitted a
bug report PR#15530
[https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15530] with a
patch on this.

There may also be related issues in the utils::Sweave drivers (looks
like code running during garbage collection) - I'll let someone else
look into that.  But, undoing setwd():s in the vignette should solve
this, iff that's what behind this in the first place.

/Henrik


 Dan


 Thanks,

 --t


 *He that would live in peace and at ease, *
 *Must not speak all he knows, nor judge all he sees.*

 Benjamin Franklin, Poor Richard's
 Almanackhttp://archive.org/details/poorrichardsalma00franrich

   [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

2013-11-04 Thread Ryan


On 11/4/13, 11:05 AM, Gabriel Becker wrote:

As a side note, I'm not sure that existence of a symbol is sufficient (it
certainly is necessary). What about situations where the symbol exists but
is stale compared to the value in the parent? Are we sure that can never
happen?
I think this is a different issue. We want to detect when a function 
depends on variables outside that function in the user's workspace, or 
variables defined in a pacakge that the user has loaded. I think we can 
assume that R child processes will be of the same version with the same 
set of installed packages, so package-defined variables will not have 
different values in child processes. For user variables, I think the 
goal should be to prevent (or at least highly discourage) dependencies 
on them entirely, so I don't think it matters what their value may be in 
the child. I realize this is somewhat counter to the question that 
started this thread, which was about exporting variables to the 
children, but I think it is the most straightforward approach. As I 
believe someone noted earlier in the thread, Henrik's original problem 
of a recursive function is properly solved by using the Recall function.


-Ryan

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] R 3.1.0 and C++11

2013-11-04 Thread Romain Francois

Le 03/11/2013 22:45, Michael Kane a écrit :

I'd like to echo Whit's sentiment and hopefully warm up this thread.
C++11's new features and functionality give R users low-level tools (like
threads, mutexes, futures, date-time, and atomic types) that work across
platforms and wouldn't require other external libraries like boost.


+1

portability is really important. One of the points is that by giving 
means to assume a certain standard, we can write more portable code.


There has been lots of discussion about Boost.Thread, etc ... no need 
for that if you have C++11.



Romain, will you be taking pull requests?


Yes. Definitely. I will carefully review them.
Can give you write access too.

I'm getting a good feel of what C++11 brings while developping Rcpp11. 
But I think it makes a lot of sense to write such an article with other 
people as well.


Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ggplot2: Add '+' operator for aes (uneval) objects

2013-11-04 Thread Thaler,Thorn,LAUSANNE,Applied Mathematics
Dear all,

Is there a reason, why there is no +-operator for aes (i.e. uneval) objects (as 
there is for themes and gg objects)? I had a couple of cases where such an 
operator would be useful, for instance to combine the result of aes and 
aes_string in functions. Any flaws with the following proposition:

 `+.uneval` - function(e1, e2) {
  dup - names(e1) %in% names(e2)
  if (any(dup)) {
duplist - paste(sQuote(names(e1)[dup]), collapse = , )
msg - sprintf(ngettext(length(dup),
   element %s occurs in both summands - second one gets 
precedence,
   elements %s occur in both summands - second one gets 
precedence),
   duplist)
warning(msg, domain = NA)
  }
  res - c(e1[!dup], e2)
  class(res) - uneval
  res
}

Any thoughts on that?

Kind Regards,

Thorn Thaler 
NRC Lausanne
Applied Mathematics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ggplot2: Add '+' operator for aes (uneval) objects

2013-11-04 Thread Brian G. Peterson
This seems like a conversation to have with the package's maintainer 
(Hadley), as suggested by the posting guide:


http://www.r-project.org/posting-guide.html
If the question relates to a contributed package , e.g., one downloaded 
from CRAN, try contacting the package maintainer first.


and not R-devel.

Regards,

Brian

On 11/04/2013 03:54 AM, Thaler,Thorn,LAUSANNE,Applied Mathematics wrote:

Dear all,

Is there a reason, why there is no +-operator for aes (i.e. uneval) objects (as 
there is for themes and gg objects)? I had a couple of cases where such an 
operator would be useful, for instance to combine the result of aes and 
aes_string in functions. Any flaws with the following proposition:

  `+.uneval` - function(e1, e2) {
   dup - names(e1) %in% names(e2)
   if (any(dup)) {
 duplist - paste(sQuote(names(e1)[dup]), collapse = , )
 msg - sprintf(ngettext(length(dup),
element %s occurs in both summands - second one gets 
precedence,
elements %s occur in both summands - second one gets 
precedence),
duplist)
 warning(msg, domain = NA)
   }
   res - c(e1[!dup], e2)
   class(res) - uneval
   res
}

Any thoughts on that?

Kind Regards,

Thorn Thaler
NRC Lausanne
Applied Mathematics


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Determining files opened by an R session

2013-11-04 Thread Martin Gregory
I'm using R in a regulated environment and one of the requirements is to 
be able to trace how a result is arrived at. I would like to be able to 
determine which files are opened in read or write mode by an R session, 
for example when a program uses source, sink, file, open, read.table, 
write.table or any of the other functions which can be used to read or 
write files. I'm also interested in output to graphics devices.


I've looked in the documentation but only found information relating to 
profiling. Looking through the source code it seems that much file i/o 
is done via the C functions *_open in main/connections.c but don't see 
anything there that looks like logging.


Could someone let me know if it is possible to log which files are opened?

Regards,
Martin Gregory

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Determining files opened by an R session

2013-11-04 Thread Bert Gunter
I am not sure R can do what you want (others may), but have a look at

?history

 for R's history mechanism, which keeps a record of all commands that
you have entered and so might satisfy your needs.

Note that there are various 3rd party GUI's/IDE's (e.g. RStudio) that
might be more to your liking. The CRAN web site should contain
information on at least some of them.

Cheers,
Bert

On Mon, Nov 4, 2013 at 1:32 PM, Martin Gregory grego...@t-online.de wrote:
 I'm using R in a regulated environment and one of the requirements is to be
 able to trace how a result is arrived at. I would like to be able to
 determine which files are opened in read or write mode by an R session, for
 example when a program uses source, sink, file, open, read.table,
 write.table or any of the other functions which can be used to read or write
 files. I'm also interested in output to graphics devices.

 I've looked in the documentation but only found information relating to
 profiling. Looking through the source code it seems that much file i/o is
 done via the C functions *_open in main/connections.c but don't see anything
 there that looks like logging.

 Could someone let me know if it is possible to log which files are opened?

 Regards,
 Martin Gregory

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Determining files opened by an R session

2013-11-04 Thread Gabriel Becker
If you have the code in a parseable form then CodeDepends will try to do
this:

 library(CodeDepends)
 code
[1] w = rnorm(10); t = read.csv('mycsv.csv'); lm(y~x, data = t)
 scr = readScript(dummy, type = R, txt = code)
 inp = getInputs(scr)
 length(inp)
[1] 3
 inp[[2]]@files
[1] mycsv.csv

The package is not currently on CRAN, but is available from the author
Duncan Temple Lang's github account:

https://github.com/duncantl/CodeDepends

There is also work going on for a more comprehensive solution:

https://github.com/karthikram/rProvenance


HTH,
~G


On Mon, Nov 4, 2013 at 1:52 PM, Bert Gunter gunter.ber...@gene.com wrote:

 I am not sure R can do what you want (others may), but have a look at

 ?history

  for R's history mechanism, which keeps a record of all commands that
 you have entered and so might satisfy your needs.

 Note that there are various 3rd party GUI's/IDE's (e.g. RStudio) that
 might be more to your liking. The CRAN web site should contain
 information on at least some of them.

 Cheers,
 Bert

 On Mon, Nov 4, 2013 at 1:32 PM, Martin Gregory grego...@t-online.de
 wrote:
  I'm using R in a regulated environment and one of the requirements is to
 be
  able to trace how a result is arrived at. I would like to be able to
  determine which files are opened in read or write mode by an R session,
 for
  example when a program uses source, sink, file, open, read.table,
  write.table or any of the other functions which can be used to read or
 write
  files. I'm also interested in output to graphics devices.
 
  I've looked in the documentation but only found information relating to
  profiling. Looking through the source code it seems that much file i/o is
  done via the C functions *_open in main/connections.c but don't see
 anything
  there that looks like logging.
 
  Could someone let me know if it is possible to log which files are
 opened?
 
  Regards,
  Martin Gregory
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 (650) 467-7374

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Determining files opened by an R session

2013-11-04 Thread Murray Stokely
Most operating systems have tools which allow you to audit the resources
used by a running process, for example the 'lsof' (list open files) command
on Unix and MacOS X.  Or, for more complex dynamic tracing, the DTrace
framework again on MacOS X or BSD Unix.

Not sure what the Windows equivalent would be, or what platform you are
using, but given the number of ways that code in packages and such may be
accessing files in C code possibly based on environment variables or other
configuration parameters, I would want to lean heavily on the operating
systems tools for things like this rather than rely on parsing your R code
looking for specific file access.

   - Murray



On Mon, Nov 4, 2013 at 1:32 PM, Martin Gregory grego...@t-online.de wrote:

 I'm using R in a regulated environment and one of the requirements is to
 be able to trace how a result is arrived at. I would like to be able to
 determine which files are opened in read or write mode by an R session, for
 example when a program uses source, sink, file, open, read.table,
 write.table or any of the other functions which can be used to read or
 write files. I'm also interested in output to graphics devices.

 I've looked in the documentation but only found information relating to
 profiling. Looking through the source code it seems that much file i/o is
 done via the C functions *_open in main/connections.c but don't see
 anything there that looks like logging.

 Could someone let me know if it is possible to log which files are opened?

 Regards,
 Martin Gregory

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Determining files opened by an R session

2013-11-04 Thread Dirk Eddelbuettel

On 4 November 2013 at 14:31, Murray Stokely wrote:
| Most operating systems have tools which allow you to audit the resources
| used by a running process, for example the 'lsof' (list open files) command
| on Unix and MacOS X.  Or, for more complex dynamic tracing, the DTrace
| framework again on MacOS X or BSD Unix.

And strace is standard on Linux.
 
| Not sure what the Windows equivalent would be, or what platform you are
| using, but given the number of ways that code in packages and such may be
| accessing files in C code possibly based on environment variables or other
| configuration parameters, I would want to lean heavily on the operating
| systems tools for things like this rather than rely on parsing your R code
| looking for specific file access.

Yep.

Dirk

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel