Re: [Rd] URL checks

2021-01-07 Thread Kirill Müller via R-devel
One other failure mode: SSL certificates trusted by browsers that are 
not installed on the check machine, e.g. the "GEANT Vereniging" 
certificate from https://relational.fit.cvut.cz/ .



K


On 07.01.21 12:14, Kirill Müller via R-devel wrote:

Hi


The URL checks in R CMD check test all links in the README and 
vignettes for broken or redirected links. In many cases this improves 
documentation, I see problems with this approach which I have detailed 
below.


I'm writing to this mailing list because I think the change needs to 
happen in R's check routines. I propose to introduce an "allow-list" 
for URLs, to reduce the burden on both CRAN and package maintainers.


Comments are greatly appreciated.


Best regards

Kirill


# Problems with the detection of broken/redirected URLs

## 301 should often be 307, how to change?

Many web sites use a 301 redirection code that probably should be a 
307. For example, https://www.oracle.com and https://www.oracle.com/ 
both redirect to https://www.oracle.com/index.html with a 301. I 
suspect the company still wants oracle.com to be recognized as the 
primary entry point of their web presence (to reserve the right to 
move the redirection to a different location later), I haven't checked 
with their PR department though. If that's true, the redirect probably 
should be a 307, which should be fixed by their IT department which I 
haven't contacted yet either.


$ curl -i https://www.oracle.com
HTTP/2 301
server: AkamaiGHost
content-length: 0
location: https://www.oracle.com/index.html
...

## User agent detection

twitter.com responds with a 400 error for requests without a user 
agent string hinting at an accepted browser.


$ curl -i https://twitter.com/
HTTP/2 400
...
...Please switch to a supported browser..

$ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1

HTTP/2 200

# Impact

While the latter problem *could* be fixed by supplying a browser-like 
user agent string, the former problem is virtually unfixable -- so 
many web sites should use 307 instead of 301 but don't. The above list 
is also incomplete -- think of unreliable links, HTTP links, other 
failure modes...


This affects me as a package maintainer, I have the choice to either 
change the links to incorrect versions, or remove them altogether.


I can also choose to explain each broken link to CRAN, this subjects 
the team to undue burden I think. Submitting a package with NOTEs 
delays the release for a package which I must release very soon to 
avoid having it pulled from CRAN, I'd rather not risk that -- hence I 
need to remove the link and put it back later.


I'm aware of https://github.com/r-lib/urlchecker, this alleviates the 
problem but ultimately doesn't solve it.


# Proposed solution

## Allow-list

A file inst/URL that lists all URLs where failures are allowed -- 
possibly with a list of the HTTP codes accepted for that link.


Example:

https://oracle.com/ 301
https://twitter.com/drob/status/1224851726068527106 400

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] URL checks

2021-01-07 Thread Kirill Müller via R-devel

Hi


The URL checks in R CMD check test all links in the README and vignettes 
for broken or redirected links. In many cases this improves 
documentation, I see problems with this approach which I have detailed 
below.


I'm writing to this mailing list because I think the change needs to 
happen in R's check routines. I propose to introduce an "allow-list" for 
URLs, to reduce the burden on both CRAN and package maintainers.


Comments are greatly appreciated.


Best regards

Kirill


# Problems with the detection of broken/redirected URLs

## 301 should often be 307, how to change?

Many web sites use a 301 redirection code that probably should be a 307. 
For example, https://www.oracle.com and https://www.oracle.com/ both 
redirect to https://www.oracle.com/index.html with a 301. I suspect the 
company still wants oracle.com to be recognized as the primary entry 
point of their web presence (to reserve the right to move the 
redirection to a different location later), I haven't checked with their 
PR department though. If that's true, the redirect probably should be a 
307, which should be fixed by their IT department which I haven't 
contacted yet either.


$ curl -i https://www.oracle.com
HTTP/2 301
server: AkamaiGHost
content-length: 0
location: https://www.oracle.com/index.html
...

## User agent detection

twitter.com responds with a 400 error for requests without a user agent 
string hinting at an accepted browser.


$ curl -i https://twitter.com/
HTTP/2 400
...
...Please switch to a supported browser..

$ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1

HTTP/2 200

# Impact

While the latter problem *could* be fixed by supplying a browser-like 
user agent string, the former problem is virtually unfixable -- so many 
web sites should use 307 instead of 301 but don't. The above list is 
also incomplete -- think of unreliable links, HTTP links, other failure 
modes...


This affects me as a package maintainer, I have the choice to either 
change the links to incorrect versions, or remove them altogether.


I can also choose to explain each broken link to CRAN, this subjects the 
team to undue burden I think. Submitting a package with NOTEs delays the 
release for a package which I must release very soon to avoid having it 
pulled from CRAN, I'd rather not risk that -- hence I need to remove the 
link and put it back later.


I'm aware of https://github.com/r-lib/urlchecker, this alleviates the 
problem but ultimately doesn't solve it.


# Proposed solution

## Allow-list

A file inst/URL that lists all URLs where failures are allowed -- 
possibly with a list of the HTTP codes accepted for that link.


Example:

https://oracle.com/ 301
https://twitter.com/drob/status/1224851726068527106 400

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Profiling: attributing costs to place of invocation (instead of place of evaluation)?

2020-02-26 Thread Kirill Müller

Hi


Consider the following example:

f <- function(expr) g(expr)
g <- function(expr) {
  h(expr)
}
h <- function(expr) {
  expr # evaluation happens here
  i(expr)
}
i <- function(expr) {
  expr # already evaluated, no costs here
  invisible()
}

rprof <- tempfile()
Rprof(rprof)
f(replicate(1e2, sample.int(1e4)))
Rprof(NULL)
cat(readLines(rprof), sep = "\n")
#> sample.interval=2
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f"

The evaluation of the slow replicate() call is deferred to the execution 
of h(), but there's no replicate() call in h's definition. This makes 
parsing the profile much more difficult than necessary.


I have pasted an experimental patch below (off of 3.6.2) that produces 
the following output:


cat(readLines(rprof), sep = "\n")
#> sample.interval=2
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "f"

This attributes the cost to the replicate() call to f(), where the call 
is actually defined. From my experience, this will give a much better 
understanding of the actual costs of each part of the code. The SIGPROF 
handler looks at sysparent and cloenv before deciding if an element of 
the call stack is to be included in the profile.


Is there interest in integrating a variant of this patch, perhaps with 
an optional argument to Rprof()?


Thanks!


Best regards

Kirill


Index: src/main/eval.c
===
--- src/main/eval.c    (revision 77857)
+++ src/main/eval.c    (working copy)
@@ -218,7 +218,10 @@
 if (R_Line_Profiling)
 lineprof(buf, R_getCurrentSrcref());

+    SEXP sysparent = NULL;
+
 for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
+    if (sysparent != NULL && cptr->cloenv != sysparent && 
cptr->sysparent != sysparent) continue;

 if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
     && TYPEOF(cptr->call) == LANGSXP) {
     SEXP fun = CAR(cptr->call);
@@ -292,6 +295,8 @@
         else
         lineprof(buf, cptr->srcref);
     }
+
+        sysparent = cptr->sysparent;
     }
 }
 }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Check length of logical vector also for operands of || and &&?

2019-08-19 Thread Kirill Müller

Hi everyone


The following behavior (in R 3.6.1 and R-devel r77040) caught me by 
surprise today:


truthy <- c(TRUE, FALSE)
falsy <- c(FALSE, TRUE, FALSE)

if (truthy) "check"
#> Warning in if (truthy) "check": the condition has length > 1 and only the
#> first element will be used
#> [1] "check"
if (falsy) "check"
#> Warning in if (falsy) "check": the condition has length > 1 and only the
#> first element will be used
if (FALSE || truthy) "check"
#> [1] "check"
if (FALSE || falsy) "check"
if (truthy || FALSE) "check"
#> [1] "check"
if (falsy || FALSE) "check"

The || operator gobbles the warning about a length > 1 vector. I wonder 
if the existing checks for length 1 can be extended to the operands of 
the || and && operators. Thanks (and apologies if this has been raised 
before).



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] active bindings in package namespace

2019-03-23 Thread Kirill Müller

Dear Jack


This doesn't answer your question, but I would advise against this design.

- Users do not expect side effects (such as network access) from 
accessing a symbol.


- A function gives you much more flexibility to change the interface 
later on. (Arguments for fetching the data, tokens for API access, ...)


- You already encountered a few quirks that make this an "interesting" 
problem.


A function call only needs a pair of parentheses.


Best regards

Kirill


On 23.03.19 16:50, Jack O. Wasey wrote:

Dear all,

I am developing a package which is a front for various online data 
(icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN 
version just has lazy-loaded data, but now the package encompasses far 
more current and historic ICD codes from different countries, these 
can't be included in the CRAN package even with maximal compression.


Other authors have solved this using functions to get the data, with 
or without a local cache of the retrieved data. No CRAN or other 
packages I have found after extensive searching use the attractive 
active binding feature of R.


The goal is simple: for the user to refer to the data by its symbol, 
e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be 
downloaded and parsed transparently (if the user has already granted 
permission, or after prompt if they haven't).


The bindings are set using commands alongside the function definitions 
in R/*.R .E.g.


makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, 
environment())

lockBinding("icd10cm_latest", environment())

For non-interactive use, CI and CRAN tests, no data should be 
downloaded, and no cache directory set up without user consent. For 
interactive use, I ask permission to create a local data cache before 
downloading data.


This works fine... until R CMD check. The following steps seems to 
'get' or 'source' everything from the package namespace, which results 
in triggering the active bindings, and this fails if I am unable to 
get consent to download data, and want to 'stop' on this error condition.

 - checking dependencies in R code
 - checking S3 generic/method consistency
 - checking foreign function calls
 - checking R code for possible problems

Debugging CI-specific binding bugs is a nightmare because these occur 
in different R sessions initiated by R CMD check.


There may be legitimate reasons to evaluate everything in the 
namespace, but I've no idea what they are. Incidentally, Rstudio also 
does 'mget' on the whole package namespace and triggers bindings 
during autocomplete. https://github.com/rstudio/rstudio/issues/4414


Is this something I should raise as an issue with R? Or does anyone 
have any idea of a sensible approach to this. Currently I have a set 
of workarounds, but this complicates the code, and has taken an awful 
lot of time. Does anyone know of any CRAN package which has active 
bindings in the package namespace?


Any ideas appreciated.

Jack Wasey

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] bias issue in sample() (PR 17494)

2019-02-26 Thread Kirill Müller

Ralf


I don't doubt this is expected with the current implementation, I doubt 
the implementation is desirable. Suggesting to turn this to


pbirthday(1e6, classes = 2^53)
## [1] 5.550956e-05

(which is still non-zero, but much less likely to cause confusion.)


Best regards

Kirill

On 26.02.19 10:18, Ralf Stubner wrote:

Kirill,

I think some level of collision is actually expected! R uses a 32bit MT
that can produce 2^32 different doubles. The probability for a collision
within a million draws is


pbirthday(1e6, classes = 2^32)

[1] 1

Greetings
Ralf


On 26.02.19 07:06, Kirill Müller wrote:

Gabe


As mentioned on Twitter, I think the following behavior should be fixed
as part of the upcoming changes:

R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"    "Rejection"
length(table(runif(1e6)))
## [1] 999863

I don't expect any collisions when using Mersenne-Twister to generate a
million floating point values. I'm not sure what causes this behavior,
but it's documented in ?Random:

"Do not rely on randomness of low-order bits from RNGs. Most of the
supplied uniform generators return 32-bit integer values that are
converted to doubles, so they take at most 2^32 distinct values and long
runs will return duplicated values (Wichmann-Hill is the exception, and
all give at least 30 varying bits.)"

The "Wichman-Hill" bit is interesting:

RNGkind("Wichmann-Hill")
length(table(runif(1e6)))
## [1] 100
length(table(runif(1e6)))
## [1] 100

Mersenne-Twister has a much much larger periodicity than Wichmann-Hill,
it would be great to see the above behavior also for Mersenne-Twister.
Thanks for considering.


Best regards

Kirill


On 20.02.19 08:01, Gabriel Becker wrote:

Luke,

I'm happy to help with this. Its great to see this get tackled (I've
cc'ed
Kelli Ottoboni who helped flag this issue).

I can prepare a patch for the RNGkind related stuff and the doc update.

As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind to the default for that
version/date, and/or b) that sessionInfo be modified to capture (and
display) the RNGkind in effect.

Best,
~G


On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke 
wrote:


Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.

Here are two examples of bad behavior through current R-devel:

   set.seed(123)
   m <- (2/5) * 2^32
   x <- sample(m, 100, replace = TRUE)
   table(x %% 2, x > m / 2)
   ##
   ##    FALSE   TRUE
   ## 0 300620 198792
   ## 1 200196 300392

   table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2)
   ##
   ##  0  1
   ## 429054 570946

I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.

Some things still needed:

- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???

Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.

There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics and    Fax:   319-335-3017
  Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bias issue in sample() (PR 17494)

2019-02-25 Thread Kirill Müller

Gabe


As mentioned on Twitter, I think the following behavior should be fixed 
as part of the upcoming changes:


R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"    "Rejection"
length(table(runif(1e6)))
## [1] 999863

I don't expect any collisions when using Mersenne-Twister to generate a 
million floating point values. I'm not sure what causes this behavior, 
but it's documented in ?Random:


"Do not rely on randomness of low-order bits from RNGs. Most of the 
supplied uniform generators return 32-bit integer values that are 
converted to doubles, so they take at most 2^32 distinct values and long 
runs will return duplicated values (Wichmann-Hill is the exception, and 
all give at least 30 varying bits.)"


The "Wichman-Hill" bit is interesting:

RNGkind("Wichmann-Hill")
length(table(runif(1e6)))
## [1] 100
length(table(runif(1e6)))
## [1] 100

Mersenne-Twister has a much much larger periodicity than Wichmann-Hill, 
it would be great to see the above behavior also for Mersenne-Twister. 
Thanks for considering.



Best regards

Kirill


On 20.02.19 08:01, Gabriel Becker wrote:

Luke,

I'm happy to help with this. Its great to see this get tackled (I've cc'ed
Kelli Ottoboni who helped flag this issue).

I can prepare a patch for the RNGkind related stuff and the doc update.

As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind to the default for that
version/date, and/or b) that sessionInfo be modified to capture (and
display) the RNGkind in effect.

Best,
~G


On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke 
wrote:


Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.

Here are two examples of bad behavior through current R-devel:

  set.seed(123)
  m <- (2/5) * 2^32
  x <- sample(m, 100, replace = TRUE)
  table(x %% 2, x > m / 2)
  ##
  ##FALSE   TRUE
  ## 0 300620 198792
  ## 1 200196 300392

  table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2)
  ##
  ##  0  1
  ## 429054 570946

I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.

Some things still needed:

- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???

Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.

There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
 Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Dots are not fixed by make.names()

2018-10-05 Thread Kirill Müller

Hi


It seems that names of the form "..#" and "..." are not fixed by 
make.names(), even though they are reserved words. The documentation reads:


> [...] Names such as ".2way" are not valid, and neither are the 
reserved words.


> Reserved words in R: [...] ... and ..1, ..2 etc, which are used to 
refer to arguments passed down from a calling function, see ?... .


I have pasted a reproducible example below.

I'd like to suggest to convert these to "...#" and "", respectively. 
Happy to contribute PR.



Best regards

Kirill


make.names(c("..1", "..13", "..."))
#> [1] "..1"  "..13" "..."
`..1` <- 1
`..13` <- 13
`...` <- "dots"

mget(c("..1", "..13", "..."))
#> $..1
#> [1] 1
#>
#> $..13
#> [1] 13
#>
#> $...
#> [1] "dots"
`..1`
#> Error in eval(expr, envir, enclos): the ... list does not contain any 
elements

`..13`
#> Error in eval(expr, envir, enclos): the ... list does not contain 13 
elements

`...`
#> Error in eval(expr, envir, enclos): '...' used in an incorrect context

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-09 Thread Kirill Müller

On 09.06.2017 13:23, Martin Maechler wrote:

Kirill Müller <kirill.muel...@ivt.baug.ethz.ch>
 on Thu, 8 Jun 2017 12:55:26 +0200 writes:

 > On 06.06.2017 22:14, Kirill Müller wrote:
 >>
 >>
 >> On 06.06.2017 10:07, Martin Maechler wrote:
 >>>>>>>> Kirill Müller <kirill.muel...@ivt.baug.ethz.ch> on
 >>>>>>>> Mon, 5 Jun 2017 17:30:20 +0200 writes:
 >>> > Hi I've noted a minor inconsistency in the
 >>> documentation: > Current R-exts reads
 >>>
 >>> > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env),
 >>> );
 >>>
 >>> > but I believe it has to be
 >>>
 >>> > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env),
 >>> );
 >>>
 >>> > because PROTECT_WITH_INDEX() returns void.
 >>>
 >>> Yes indeed, thank you Kirill!
 >>>
 >>> note that the same is true for its partner
 >>> function|macro REPROTECT()
 >>>
 >>> However, as PROTECT() is used a gazillion times and
 >>> PROTECT_WITH_INDEX() is used about 100 x less, and
 >>> PROTECT() *does* return the SEXP, I do wonder why
 >>> PROTECT_WITH_INDEX() and REPROTECT() could not behave
 >>> the same as PROTECT() (a view at the source code seems
 >>> to suggest a change to be trivial).  I assume usual
 >>> compiler optimization would not create less efficient
 >>> code in case the idiom PROTECT_WITH_INDEX(s = ...)  is
 >>> used, i.e., in case the return value is not used ?
 >>>
 >>> Maybe this is mainly a matter of taste, but I find the
 >>> use of
 >>>
 >>> SEXP s = PROTECT();
 >>>
 >>> quite nice in typical cases where this appears early in
 >>> a function.  Also for that reason -- but even more for
 >>> consistency -- it would also be nice if
 >>> PROTECT_WITH_INDEX() behaved the same.
 >> Thanks, Martin, this sounds reasonable. I've put together
 >> a patch for review [1], a diff for applying to SVN (via
 >> `cat | patch -p1`) would be [2]. The code compiles on my
 >> system.
 >>
 >>
 >> -Kirill
 >>
 >>
 >> [1] https://github.com/krlmlr/r-source/pull/5/files
 >>
 >> [2]
 >> https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff

 > I forgot to mention that this patch applies cleanly to r72768.

Thank you, Kirill.
I've been a bit busy so did not get to reply more quickly.

Just to be clear: I did not ask for a patch but was _asking_ /
requesting comments about the possibility to do that.

In the mean time, within the core team, the opinions were
mixed and costs of the change (recompilations needed, C source level
check tools would need updating / depend on R versions) are
clearly non-zero.

As a consquence, we will fix the documentation, rather than changing the API.
Thanks for looking into this. The patch was more a proof of concept, I 
don't mind throwing it away.



-Kirill

Martin


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-08 Thread Kirill Müller

On 06.06.2017 22:14, Kirill Müller wrote:



On 06.06.2017 10:07, Martin Maechler wrote:

Kirill Müller <kirill.muel...@ivt.baug.ethz.ch>
 on Mon, 5 Jun 2017 17:30:20 +0200 writes:

 > Hi I've noted a minor inconsistency in the documentation:
 > Current R-exts reads

 > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), );

 > but I believe it has to be

 > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), );

 > because PROTECT_WITH_INDEX() returns void.

Yes indeed, thank you Kirill!

note that the same is true for its partner function|macro REPROTECT()

However, as  PROTECT() is used a gazillion times  and
PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT()
*does* return the SEXP,
I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not
behave the same as PROTECT()
(a view at the source code seems to suggest a change to be trivial).
I assume usual compiler optimization would not create less
efficient code in case the idiom   PROTECT_WITH_INDEX(s = ...)
is used, i.e., in case the return value is not used ?

Maybe this is mainly a matter of taste,  but I find the use of

SEXP s = PROTECT();

quite nice in typical cases where this appears early in a function.
Also for that reason -- but even more for consistency -- it
would also be nice if  PROTECT_WITH_INDEX()  behaved the same.
Thanks, Martin, this sounds reasonable. I've put together a patch for 
review [1], a diff for applying to SVN (via `cat | patch -p1`) would 
be [2]. The code compiles on my system.



-Kirill


[1] https://github.com/krlmlr/r-source/pull/5/files

[2] 
https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff


I forgot to mention that this patch applies cleanly to r72768.


-Kirill






Martin

 > Best regards
 > Kirill


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-06 Thread Kirill Müller



On 06.06.2017 10:07, Martin Maechler wrote:

Kirill Müller <kirill.muel...@ivt.baug.ethz.ch>
 on Mon, 5 Jun 2017 17:30:20 +0200 writes:

 > Hi I've noted a minor inconsistency in the documentation:
 > Current R-exts reads

 > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), );

 > but I believe it has to be

 > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), );

 > because PROTECT_WITH_INDEX() returns void.

Yes indeed, thank you Kirill!

note that the same is true for its partner function|macro REPROTECT()

However, as  PROTECT() is used a gazillion times  and
PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT()
*does* return the SEXP,
I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not
behave the same as PROTECT()
(a view at the source code seems to suggest a change to be trivial).
I assume usual compiler optimization would not create less
efficient code in case the idiom   PROTECT_WITH_INDEX(s = ...)
is used, i.e., in case the return value is not used ?

Maybe this is mainly a matter of taste,  but I find the use of

SEXP s = PROTECT();

quite nice in typical cases where this appears early in a function.
Also for that reason -- but even more for consistency -- it
would also be nice if  PROTECT_WITH_INDEX()  behaved the same.
Thanks, Martin, this sounds reasonable. I've put together a patch for 
review [1], a diff for applying to SVN (via `cat | patch -p1`) would be 
[2]. The code compiles on my system.



-Kirill


[1] https://github.com/krlmlr/r-source/pull/5/files

[2] https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff




Martin

 > Best regards
 > Kirill


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-05 Thread Kirill Müller

Hi


I've noted a minor inconsistency in the documentation: Current R-exts reads

s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), );

but I believe it has to be

PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), );

because PROTECT_WITH_INDEX() returns void.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] source(), parse(), and foreign UTF-8 characters

2017-05-09 Thread Kirill Müller

On 09.05.2017 13:19, Duncan Murdoch wrote:

On 09/05/2017 3:42 AM, Kirill Müller wrote:

Hi


I'm having trouble sourcing or parsing a UTF-8 file that contains
characters that are not representable in the current locale ("foreign
characters") on Windows. The source() function stops with an error, the
parse() function reencodes all foreign characters using the <U+>
notation. I have added a reproducible example below the message.

This seems well within the bounds of documented behavior, although the
documentation to source() could mention that the file can't contain
foreign characters. Still, I'd prefer if UTF-8 "just worked" in R, and
I'm willing to invest substantial time to help with that. Before
starting to write a detailed proposal, I feel that I need a better
understanding of the problem, and I'm grateful for any feedback you
might have.

I have looked into character encodings in the context of the dplyr
package, and I have observed the following behavior:

- Strings are treated preferentially in the native encoding
- Only upon specific request (via translateCharUTF8() or enc2utf8() or
...), they are translated to UTF-8 and marked as such
- On UTF-8 systems, strings are never marked as UTF-8
- ASCII strings are marked as ASCII internally, but this information
doesn't seem to be available, e.g., Encoding() returns "unknown" for
such strings
- Most functions in R are encoding-agnostic: they work the same
regardless if they receive a native or UTF-8 encoded string if they are
properly tagged
- One important difference are symbols, which must be in the native
encoding (and are always converted to native encoding, using <U+>
escapes)
- I/O is centered around the native encoding, e.g., writeLines() always
reencodes to the native encoding
- There is the "bytes" encoding which avoids reencoding.

I haven't looked into serialization or plot devices yet.

The conclusion to the "UTF-8 manifesto" [1] suggests "... to use UTF-8
narrow strings everywhere and convert them back and forth when using
platform APIs that don’t support UTF-8 ...". (It is written in the
context of the UTF-16 encoding used internally on Windows, but seems to
apply just the same here for the native encoding.) I think that Unicode
support in R could be greatly improved if we follow these guidelines.
This seems to mean:

- Convert strings to UTF-8 as soon as possible, and mark them as such
(also on systems where UTF-8 is the native encoding)
- Translate to native only upon specific request, e.g., in calls to API
functions or perhaps for .C()
- Use UTF-8 for symbols
- Avoid the forced round-trip to the native encoding in I/O functions
and for parsing (but still read/write native by default)
- Carefully look into serialization and plot devices
- Add helper functions that simplify mundane tasks such as
reading/writing a UTF-8 encoded file


Those are good long term goals, though I think the effort is easier 
than you think.  Rather than attempting to do it all at once, you 
should look for ways to do it gradually and submit self-contained 
patches.  In many cases it doesn't matter if strings are left in the 
local encoding, because the encoding doesn't matter.  The problems 
arise when UTF-8 strings are converted to the local encoding before 
it's necessary, because that's a lossy conversion.  So a simple way to 
proceed is to identify where these conversions occur, and remove them 
one-by-one.
Thanks, Duncan, this looks like a good start indeed. Did you really mean 
to say "the effort is easier than I think"? It would be great if I had 
overestimated the effort, I seldom do. That said, I'd be grateful if you 
could review/integrate/... future patches of mine towards parsing and 
sourcing of UTF-8 files with foreign characters, this problem seems to 
be self-contained (but perhaps not that easy).


I still think symbols should be in UTF-8, and this change might be 
difficult to split into smaller changes, especially if taking into 
account serialization and other potential pitfalls.




Currently I'm working on bug 16098, "Windows doesn't handle high 
Unicode code points".  It doesn't require many changes at all to 
handle input of those characters; all the remaining issues are 
avoiding the problems you identify above.  The origin of the issue is 
the fact that in Windows wchar_t is only 16 bits (not big enough to 
hold all Unicode code points).  As far as I know, Windows has no 
standard type to hold a Unicode code point, most of the run-time 
functions still use the 16 bit wchar_t.

I didn't mention non-BMP characters, they are an important issue as well.



I think once that bug is dealt with, 90+% of the remaining issues 
could be solved by avoiding translateChar on Windows.  This could be 
done by avoiding it everywhere, or by acting as though Windows is 
running in a UTF-8 locale until you actually need to write to a file.  
Other s

[Rd] source(), parse(), and foreign UTF-8 characters

2017-05-09 Thread Kirill Müller

Hi


I'm having trouble sourcing or parsing a UTF-8 file that contains 
characters that are not representable in the current locale ("foreign 
characters") on Windows. The source() function stops with an error, the 
parse() function reencodes all foreign characters using the  
notation. I have added a reproducible example below the message.


This seems well within the bounds of documented behavior, although the 
documentation to source() could mention that the file can't contain 
foreign characters. Still, I'd prefer if UTF-8 "just worked" in R, and 
I'm willing to invest substantial time to help with that. Before 
starting to write a detailed proposal, I feel that I need a better 
understanding of the problem, and I'm grateful for any feedback you 
might have.


I have looked into character encodings in the context of the dplyr 
package, and I have observed the following behavior:


- Strings are treated preferentially in the native encoding
- Only upon specific request (via translateCharUTF8() or enc2utf8() or 
...), they are translated to UTF-8 and marked as such

- On UTF-8 systems, strings are never marked as UTF-8
- ASCII strings are marked as ASCII internally, but this information 
doesn't seem to be available, e.g., Encoding() returns "unknown" for 
such strings
- Most functions in R are encoding-agnostic: they work the same 
regardless if they receive a native or UTF-8 encoded string if they are 
properly tagged
- One important difference are symbols, which must be in the native 
encoding (and are always converted to native encoding, using  
escapes)
- I/O is centered around the native encoding, e.g., writeLines() always 
reencodes to the native encoding

- There is the "bytes" encoding which avoids reencoding.

I haven't looked into serialization or plot devices yet.

The conclusion to the "UTF-8 manifesto" [1] suggests "... to use UTF-8 
narrow strings everywhere and convert them back and forth when using 
platform APIs that don’t support UTF-8 ...". (It is written in the 
context of the UTF-16 encoding used internally on Windows, but seems to 
apply just the same here for the native encoding.) I think that Unicode 
support in R could be greatly improved if we follow these guidelines. 
This seems to mean:


- Convert strings to UTF-8 as soon as possible, and mark them as such 
(also on systems where UTF-8 is the native encoding)
- Translate to native only upon specific request, e.g., in calls to API 
functions or perhaps for .C()

- Use UTF-8 for symbols
- Avoid the forced round-trip to the native encoding in I/O functions 
and for parsing (but still read/write native by default)

- Carefully look into serialization and plot devices
- Add helper functions that simplify mundane tasks such as 
reading/writing a UTF-8 encoded file


I'm sure I've missed many potential pitfalls, your input is greatly 
appreciated. Thanks for your attention.


Further ressources: A write-up by Prof. Ripley [2], a section in R-ints 
[3], a blog post by Ista Zahn [4], a StackOverflow search [5].



Best regards

Kirill



[1] http://utf8everywhere.org/#conclusions

[2] https://developer.r-project.org/Encodings_and_R.html

[3] 
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Encodings-for-CHARSXPs


[3] 
http://people.fas.harvard.edu/~izahn/posts/reading-data-with-non-native-encoding-in-r/


[4] 
http://stackoverflow.com/search?tab=votes=%5br%5d%20encoding%20windows%20is%3aquestion




# Use one of the following:
id <- "Gl\u00fcck"
id <- "\u5e78\u798f"
id <- "\u0441\u0447\u0430\u0441\u0442\u044c\u0435"
id <- "\ud589\ubcf5"

file_contents <- paste0('"', id, '"')
Encoding(file_contents)
raw_file_contents <- charToRaw(file_contents)

path <- tempfile(fileext = ".R")
writeBin(raw_file_contents, path)
file.size(path)
length(raw_file_contents)

# Escapes the string
parse(text = file_contents)

# Throws an error
print(source(path, encoding = "UTF-8"))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Upgrading a package to which other packages are LinkingTo

2016-12-16 Thread Kirill Müller

Thanks for discussing this.

On 16.12.2016 17:19, Dirk Eddelbuettel wrote:

On 16 December 2016 at 11:00, Duncan Murdoch wrote:
| On 16/12/2016 10:40 AM, Dirk Eddelbuettel wrote:
| > On 16 December 2016 at 10:14, Duncan Murdoch wrote:
| > | On 16/12/2016 8:37 AM, Dirk Eddelbuettel wrote:
| > | >
| > | > On 16 December 2016 at 08:20, Duncan Murdoch wrote:
| > | > | Perhaps the solution is to recommend that packages which export their
| > | > | C-level entry points either guarantee them not to change or offer
| > | > | (require?) version checks by user code.  So dplyr should start out by
| > | > | saying "I'm using Rcpp interface 0.12.8".  If Rcpp has a new version
| > | > | with a compatible interface, it replies "that's fine".  If Rcpp has
| > | > | changed its interface, it says "Sorry, I don't support that any more."
Sounds good to me, I was considering something similar. dplyr can simply 
query Rcpp's current version in .onLoad(), compare it to the version at 
installation time and act accordingly.

| > | >
| > | > We try. But it's hard, and I'd argue, likely impossible.
| > | >
| > | > For example I even added a "frozen" package [1] in the sources / unit 
tests
| > | > to test for just this. In practice you just cannot hit every possible 
access
| > | > point of the (rich, in our case) API so the tests pass too often.
| > | >
| > | > Which is why we relentlessly test against reverse-depends to _at least 
ensure
| > | > buildability_ from our releases.
| >
| > I meant to also add:  "... against a large corpus of other packages."
| > The intent is to empirically answer this.
| >
| > | > As for seamless binary upgrade, I don't think in can work in practice.  
Ask
| > | > Uwe one day we he rebuilds everything every time on Windows. And for 
what it
| > | > is worth, we essentially do the same in Debian.
| > | >
| > | > Sometimes you just need to rebuild.  That may be the price of admission 
for
| > | > using the convenience of rich C++ interfaces.
| > | >
| > |
| > | Okay, so would you say that Kirill's suggestion is not overkill?  Every
| > | time package B uses LinkingTo: A, R should assume it needs to rebuild B
| > | when A is updated?
| >
| > Based on my experience is a "halting problem" -- i.e. cannot know ex ante.
| >
| > So "every time" would be overkill to me.  Sometimes you know you must
| > recompile (but try to be very prudent with public-facing API).  Many times
| > you do not. It is hard to pin down.
I'd argue that recompiling/reinstalling B is cheap enough and the safest 
option. So unless there is a risk, why not simply do it every time A 
updates? This could be implemented with a perhaps small change in R: 
When installing A, treat all packages that have A in both LinkingTo and 
Imports as dependencies that need to be reinstalled.



-Kirill

| >
| > At work we have a bunch of servers with Rcpp and many packages against them
| > (installed system-wide for all users). We _very really_ needs rebuild.

Edit:  "We _very rarely_ need rebuilds" is what was meant there.
  
| So that comes back to my suggestion:  you should provide a way for a

| dependent package to ask if your API has changed.  If you say it hasn't,
| the package is fine.  If you say it has, the package should abort,
| telling the user they need to reinstall it.  (Because it's a hard
| question to answer, you might get it wrong and say it's fine when it's
| not.  But that's easy to fix:  just make a new release that does require

Sure.

We have always increased the higher-order version number when that is needed.

One problem with your proposal is that the testing code may run after the
package load, and in the case where it matters ... that very code may not get
reached because the package didn't load.

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] withAutoprint({ .... }) ?

2016-09-27 Thread Kirill Müller

On 25.09.2016 18:29, Martin Maechler wrote:

I'm now committing my version (including (somewhat incomplete)
documentation, so you (all) can look at it and try / test it further.
Thanks, that's awesome. Is `withAutoprint()` recursive? How about 
calling the new function in `example()` (instead of `source()` as it is 
now) so that examples are always rendered in auto-print mode? That may 
add some extra output to examples (which can be removed easily), but 
solve the original problem in a painless way.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] withAutoprint({ .... }) ?

2016-09-02 Thread Kirill Müller

On 02.09.2016 14:38, Duncan Murdoch wrote:

On 02/09/2016 7:56 AM, Martin Maechler wrote:

On R-help, with subject
   '[R] source() does not include added code'


Joshua Ulrich 
on Wed, 31 Aug 2016 10:35:01 -0500 writes:


> I have quantstrat installed and it works fine for me. If you're
> asking why the output of t(tradeStats('macross')) isn't being 
printed,

> that's because of what's described in the first paragraph in the
> *Details* section of help("source"):

> Note that running code via ‘source’ differs in a few respects from
> entering it at the R command line.  Since expressions are not
> executed at the top level, auto-printing is not done. So you will
> need to include explicit ‘print’ calls for things you want to be
> printed (and remember that this includes plotting by ‘lattice’,
> FAQ Q7.22).



> So you need:

> print(t(tradeStats('macross')))

> if you want the output printed to the console.

indeed, and "of course"" ;-)

As my subject indicates, this is another case, where it would be
very convenient to have a function

   withAutoprint()

so the OP could have (hopefully) have used
   withAutoprint(source(..))
though that would have been equivalent to the already nicely existing

   source(.., print.eval = TRUE)

which works via the  withVisible(.) utility that returns for each
'expression' if it would auto print or not, and then does print (or
not) accordingly.

My own use cases for such a withAutoprint({...})
are demos and examples, sometimes even package tests which I want to 
print:


Assume I have a nice demo / example on a help page/ ...

foo(..)
(z <- bar(..))
summary(z)


where I carefully do print parts (and don't others),
and suddenly I find I want to run that part of the demo /
example / test only in some circumstances, e.g., only when
interactive, but not in BATCH, or only if it is me, the package 
maintainer,


if( identical(Sys.getenv("USER"), "maechler") ) {
  foo(..)
  (z <- bar(..))
  summary(z)
  
}

Now all the auto-printing is gone, and

1) I have to find out which of these function calls do autoprint and 
wrap

   a print(..) around these, and

2) the result is quite ugly (for an example on a help page etc.)

What I would like in a future R, is to be able to simply wrap the "{
.. } above with an 'withAutoprint(.) :

if( identical(Sys.getenv("USER"), "maechler") ) withAutoprint({
  foo(..)
  (z <- bar(..))
  summary(z)
  
})

Conceptually such a function could be written similar to source() 
with an R
level for loop, treating each expression separately, calling eval(.) 
etc.
That may cost too much performnace, ... still to have it would be 
better than

not having the possibility.



If you read so far, you'd probably agree that such a function
could be a nice asset in R,
notably if it was possible to do this on the fast C level of R's main
REPL.

Have any of you looked into how this could be provided in R ?
If you know the source a little, you will remember that there's
the global variable  R_Visible  which is crucial here.
The problem with that is that it *is* global, and only available
as that; that the auto-printing "concept" is so linked to "toplevel 
context"
and that is not easy, and AFAIK not so much centralized in one place 
in the
source. Consequently, all kind of (very) low level functions 
manipulate R_Visible

temporarily and so a C level implementation of withAutoprint() may
need considerable more changes than just setting R_Visible to TRUE in 
one

place.

Have any efforts / experiments already happened towards providing such
functionality ?


I don't think the performance cost would matter.  If you're printing 
something, you're already slow.  So doing this at the R level would 
make most sense to me --- that's how Sweave and source and knitr do 
it, so it can't be that bad.


Duncan Murdoch

A C-level implementation would bring the benefit of a lean traceback() 
in case of an error. I suspect eval() could be enhanced to auto-print.


By the same token it would be extremely helpful to have a C-level 
implementation of local() which wouldn't litter the stack trace.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R process killed when allocating too large matrix (Mac OS X)

2016-05-13 Thread Kirill Müller

On 12.05.2016 09:51, Martin Maechler wrote:

 > My ulimit package exposes this API ([1], should finally submit it to
 > CRAN); unfortunately this very API seems to be unsupported on OS X
 > [2,3]. Last time I looked into it, neither of the documented settings
 > achieved the desired effect.

 > -Kirill

 > [1] http://krlmlr.github.io/ulimit
 > [2]
 > 
http://stackoverflow.com/questions/3274385/how-to-limit-memory-of-a-os-x-program-ulimit-v-neither-m-are-working
 > [3]
 > 
https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html


...

In an ideal word, some of us,
 from R core, Jeroen, Kyrill, ,
 maintainer("microbenchmark>, ...
would sit together and devise an R function interface (based on
low level platform specific interfaces, specifically for at least
Linux/POSIX-compliant, Mac, and Windows) which would allow
something  like your rlimit(..) calls below.

We'd really need something to work on all platforms ideally,
to be used by R package maintainers
and possibly even better by R itself at startup, setting a
reasonable memory cap - which the user could raise even to +Inf (or lower
even more).

I haven't found a Windows API that allows limiting the address space, 
only one that limits the working set size; it seems likely that this is 
the best we can get on OS X, too, but then my experience with OS X is 
very limited.


mallinfo() is used on Windows and seems to be available on Linux, too, 
but not on OS X.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R process killed when allocating too large matrix (Mac OS X)

2016-05-11 Thread Kirill Müller
My ulimit package exposes this API ([1], should finally submit it to 
CRAN); unfortunately this very API seems to be unsupported on OS X 
[2,3]. Last time I looked into it, neither of the documented settings 
achieved the desired effect.



-Kirill


[1] http://krlmlr.github.io/ulimit
[2] 
http://stackoverflow.com/questions/3274385/how-to-limit-memory-of-a-os-x-program-ulimit-v-neither-m-are-working
[3] 
https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html



On 10.05.2016 01:08, Jeroen Ooms wrote:

On 05/05/2016 10:11, Uwe Ligges wrote:

Actually this also happens under Linux and I had my R processes killed
more than once (and much worse also other processes so that we had to
reboot a server, essentially).

I found that setting RLIMIT_AS [1] works very well on Linux. But this
requires that you cap memory to some fixed value.


library(RAppArmor)
rlimit_as(1e9)
rnorm(1e9)

Error: cannot allocate vector of size 7.5 Gb

The RAppArmor package has many other utilities to protect your server
such from a mis-behaving process such as limiting cpu time
(RLIMIT_CPU), fork bombs (RLIMIT_NPROC) and file sizes (RLIMIT_FSIZE).

[1] http://linux.die.net/man/2/getrlimit

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

2016-05-09 Thread Kirill Müller

Hi


I think the following behavior is a regression from R 3.2.5:

> match(iconv(  c("\u00f8", "A"), from = "UTF8", to  = "latin1" ), 
"\u00f8")

[1]  1 NA
> match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8")
[1] NA
> match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8", 
incomparables = NA)

[1] 1

I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10.

The specific behavior makes me think this is related to the following 
NEWS entry:


match(x, table) is faster (sometimes by an order of magnitude) when x is 
of length one and incomparables is unchanged (PR#16491).



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-19 Thread Kirill Müller
Thanks for looking into it, your approach sounds good to me. See also 
R_has_methods_attached() 
(https://github.com/wch/r-source/blob/42ecf5f492a005f5398cbb4c9becd4aa5af9d05c/src/main/objects.c#L258-L265).


I'm fine with Rscript not loading "methods", as long as everything works 
properly with "methods" loaded but not attached.



-Kirill


On 19.04.2016 04:10, Michael Lawrence wrote:

Right, the methods package is not attached by default when running R
with Rscript. We should probably remove that special case, as it
mostly just leads to confusion, but that won't happen immediately.

For now, the S4_extends() should probably throw an error when the
methods namespace is not loaded. And the check should be changed to
directly check whether R_MethodsNamespace has been set to something
other than the default (R_GlobalEnv). Agreed?

On Mon, Apr 18, 2016 at 4:35 PM, Kirill Müller
<kirill.muel...@ivt.baug.ethz.ch> wrote:

Scenario: An S3 method is declared for an S4 base class but called for an
instance of a derived class.

Steps to reproduce:


Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <-
function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

Error in UseMethod("test", x) :
   no applicable method for 'test' applied to an object of class "lsyMatrix"
Calls: 
1: MatrixDispatchTest::test(Matrix::Matrix())


Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x);
test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

[1] "Hi"

To me, it looks like a sanity check in line 655 of src/main/attrib.c is
making wrong assumptions, but there might be other reasons.
(https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656)

Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-18 Thread Kirill Müller

Please omit "MatrixDispatchTest::" from the test scripts:

Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- 
function(x) 'Hi'; test(Matrix::Matrix())"


Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); 
test.Matrix <- function(x) 'Hi'; test(Matrix::Matrix())"



-Kirill


On 19.04.2016 01:35, Kirill Müller wrote:
Scenario: An S3 method is declared for an S4 base class but called for 
an instance of a derived class.


Steps to reproduce:

> Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- 
function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

Error in UseMethod("test", x) :
  no applicable method for 'test' applied to an object of class 
"lsyMatrix"

Calls: 
1: MatrixDispatchTest::test(Matrix::Matrix())

> Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); 
test.Matrix <- function(x) 'Hi'; 
MatrixDispatchTest::test(Matrix::Matrix())"

[1] "Hi"

To me, it looks like a sanity check in line 655 of src/main/attrib.c 
is making wrong assumptions, but there might be other reasons. 
(https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656)


Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-18 Thread Kirill Müller
Scenario: An S3 method is declared for an S4 base class but called for 
an instance of a derived class.


Steps to reproduce:

> Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- 
function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

Error in UseMethod("test", x) :
  no applicable method for 'test' applied to an object of class "lsyMatrix"
Calls: 
1: MatrixDispatchTest::test(Matrix::Matrix())

> Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); 
test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

[1] "Hi"

To me, it looks like a sanity check in line 655 of src/main/attrib.c is 
making wrong assumptions, but there might be other reasons. 
(https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656)


Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Scripts to generate data objects

2016-03-31 Thread Kirill Müller
The devtools::use_data_raw() function creates a "data-raw" directory for 
this purpose, and adds it to .Rbuildignore so that it's not included in 
the built package. Your scripts can then write the data to the proper 
place using devtools::use_data().



-Kirill


On 30.03.2016 14:03, Iago Mosqueira wrote:

Hello,

What is the best way of keeping R scripts that are used to generate the data 
files in the data/ folder? These are not meant to be available to the user, but 
I would like to keep them in the package itself. Right now I am storing them 
inside data/, for example PKG/data/datasetone.R to create 
PKG/data/dataseton.RData, and then adding those R files to .Rbuildignore.

Are there any other sensible ways of doing this?

Thanks,


Iago






[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] DESCRIPTION file: Space after colon mandatory?

2016-03-29 Thread Kirill Müller
According to R-exts, DESCRIPTION is a DCF variant, and " Fields start 
with an ASCII name immediately followed by a colon: the value starts 
after the colon and a space." However, according to the linked 
https://www.debian.org/doc/debian-policy/ch-controlfields.html, 
horizontal space before and after a value are trimmed, this is also the 
behavior of read.dcf().


Is this an omission in the documentation, or is the space after the 
colon actually required? Thanks.



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] getParseData() for installed packages

2016-03-10 Thread Kirill Müller

On 10.03.2016 16:05, Duncan Murdoch wrote:

On 10/03/2016 9:53 AM, Kirill Müller wrote:


On 10.03.2016 15:49, Duncan Murdoch wrote:


I install using R CMD INSTALL ., and I have options(keep.source = TRUE,
keep.source.pkgs = TRUE) in my .Rprofile . The srcrefs are all there,
it's just that the parse data is not where I'd expect it to be.



Okay, I see what you describe.  I'm not going to have time to track 
this down for a while, so I'm going to post your message as a bug 
report, and hopefully will be able to get to it before 3.3.0.


Thanks. A related note: Would it be possible to make available all of 
first_byte/last_byte/first_column/last_column in the parse data, for 
easier srcref reconstruction?



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] getParseData() for installed packages

2016-03-10 Thread Kirill Müller



On 10.03.2016 15:49, Duncan Murdoch wrote:

On 10/03/2016 8:27 AM, Kirill Müller wrote:

I can't seem to reliably obtain parse data via getParseData() for
functions from installed packages. The parse data seems to be available
only for the *last* file in the package.

See [1] for a small example package with just two functions f and g in
two files a.R and b.R. See [2] for a documented test run on installed
package (Ubuntu 15.10, UTF-8 locale, R 3.2.3). Same behavior with
r-devel (r70303).

The parse data helps reliable coverage analysis [3]. Please advise.


You don't say how you built the package.  Parse data is omitted by 
default.


Duncan Murdoch


I install using R CMD INSTALL ., and I have options(keep.source = TRUE, 
keep.source.pkgs = TRUE) in my .Rprofile . The srcrefs are all there, 
it's just that the parse data is not where I'd expect it to be.



-Kirill



Best regards

Kirill


[1] https://github.com/krlmlr/covr.dummy
[2] http://rpubs.com/krlmlr/getParseData
[3] https://github.com/jimhester/covr/pull/154

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] getParseData() for installed packages

2016-03-10 Thread Kirill Müller
I can't seem to reliably obtain parse data via getParseData() for 
functions from installed packages. The parse data seems to be available 
only for the *last* file in the package.


See [1] for a small example package with just two functions f and g in 
two files a.R and b.R. See [2] for a documented test run on installed 
package (Ubuntu 15.10, UTF-8 locale, R 3.2.3). Same behavior with 
r-devel (r70303).


The parse data helps reliable coverage analysis [3]. Please advise.


Best regards

Kirill


[1] https://github.com/krlmlr/covr.dummy
[2] http://rpubs.com/krlmlr/getParseData
[3] https://github.com/jimhester/covr/pull/154

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Namespace error

2016-02-23 Thread Kirill Müller
It's difficult to tell without seeing the source code, but the NAMESPACE 
you posted doesn't seem to contain "View". This file usually gets 
updated when you call devtools::document() or roxygen2::roxygenize(). 
What happens if you run one of these functions?



-Kirill


On 16.02.2016 18:14, Glenn Schultz wrote:

All

I am not sure why I am getting this error and I cannot find anything 
on the net other than try to restart R. I am
using Roxygen2 and it clearly says don't edit by hand at the top of 
the namespace so I am stuck as what to do or look for.


Glenn

Error in namespaceExport(ns, exports) : undefined exports: View
Error: package or namespace load failed for ‘BondLab’
Execution halted
* checking whether the namespace can be loaded with stated 
dependencies ... WARNING

Error in namespaceExport(ns, exports) : undefined exports: View
Calls: loadNamespace ... namespaceImportFrom -> asNamespace -> 
loadNamespace -> namespaceExport

Execution halted

A namespace must be able to be loaded with just the base namespace
loaded: otherwise if the namespace gets loaded by a saved object, the
session will be unable to start.

Probably some imports need to be declared in the NAMESPACE file.
* checking whether the namespace can be unloaded cleanly ... OK
* checking dependencies in R code ... NOTE
Error: package or namespace load failed for ‘BondLab’
Call sequence:
2: stop(gettextf("package or namespace load failed for %s", 
sQuote(package)), call. = FALSE, domain = NA)



Here is my namespace:

# Generated by roxygen2: do not edit by hand

export(AtomsData)
export(BeginBal)
export(Bond)
export(BondAnalytics)
export(BondBasisConversion)
export(BondCashFlows)
export(CDR.To.MDR)
export(CIRBondPrice)
export(CIRSim)
export(CPR.To.SMM)
export(CalibrateCIR)
export(CashFlowTable)
export(CollateralGroup)
export(CusipRecord)
export(DollarRoll)
export(DollarRollAnalytics)
export(Effective.Convexity)
export(Effective.Duration)
export(Effective.Measure)
export(EndingBal)
export(EstimYTM)
export(Forward.Rate)
export(ForwardPassThrough)
export(HPISim)
export(Interest)
export(MBS)
export(MakeBondDetails)
export(MakeCollateral)
export(MakeMBSDetails)
export(MakeModelTune)
export(MakeRAID)
export(MakeRDME)
export(MakeScenario)
export(MakeSchedule)
export(MakeTranche)
export(ModelTune)
export(Mortgage.Monthly.Payment)
export(Mortgage.OAS)
export(MortgageCashFlow)
export(MortgageCashFlowArray)
export(MortgageCashFlow_Array)
export(MortgageRate)
export(Mtg.Scenario)
export(MtgRate)
export(MtgTermStructure)
export(PPC.Ramp)
export(PassThroughOAS)
export(PaymentDate)
export(PrepaymentAssumption)
export(RDMEData)
export(RDMEFactor)
export(REMICDeal)
export(REMICGroupConn)
export(REMICSchedules)
export(REMICWaterFall)
export(Rates)
export(ReadRAID)
export(Remain.Balance)
export(RemicStructure)
export(SMM.To.CPR)
export(SMMVector.To.CPR)
export(SaveCollGroup)
export(SaveMBS)
export(SaveModelTune)
export(SaveRAID)
export(SaveRDME)
export(SaveREMIC)
export(SaveScenario)
export(SaveSchedules)
export(SaveTranche)
export(SaveTranches)
export(ScenarioCall)
export(Sched.Prin)
export(SwapRateData)
export(TermStructure)
export(TimeValue)
export(Tranches)
export(ULTV)
export(bondprice)
exportClasses(AtomsAnalytics)
exportClasses(AtomsData)
exportClasses(AtomsScenario)
exportClasses(BondCashFlows)
exportClasses(BondDetails)
exportClasses(BondTermStructure)
exportClasses(MBSDetails)
exportClasses(TermStructure)
import(data.tree)
import(methods)
import(optimx)
importFrom(lubridate,"%m+%")
importFrom(lubridate,day)
importFrom(lubridate,month)
importFrom(lubridate,year)
importFrom(lubridate,years)
importFrom(termstrc,create_cashflows_matrix)
importFrom(termstrc,create_maturities_matrix)
importFrom(termstrc,estim_cs)
importFrom(termstrc,estim_nss)
importFrom(termstrc,forwardrates)
importFrom(termstrc,spotrates)
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[Rd] R CMD check --as-cran without qpdf

2015-10-10 Thread Kirill Müller
Today, a package that has an HTML vignette (but no PDF vignette) failed 
R CMD check --as-cran on a system without qpdf. I think the warning 
originates here [1], due to a premature check for the existence of qpdf 
[2]. Setting R_QPDF=true (as in /bin/true) helped, but perhaps it's 
possible to check qpdf existence only when it matters.


I have attached a patch (untested) that could serve as a starting point. 
The code links correspond to SVN revision 69500. Thanks.



Best regards

Kirill


[1] 
https://github.com/wch/r-source/blob/f42ee5e7ecf89a245afd6619b46483f1e3594ab7/src/library/tools/R/check.R#L322-L326, 

[2] 
https://github.com/wch/r-source/blob/f42ee5e7ecf89a245afd6619b46483f1e3594ab7/src/library/tools/R/check.R#L4426-L4428
diff --git src/library/tools/R/check.R src/library/tools/R/check.R
index a508453..e4e5027 100644
--- src/library/tools/R/check.R
+++ src/library/tools/R/check.R
@@ -319,11 +319,7 @@ setRlibs <-
  paste("  file", paste(sQuote(miss[f]), collapse = ", "),
"will not be installed: please remove it\n"))
 }
-if (dir.exists("inst/doc")) {
-if (R_check_doc_sizes) check_doc_size()
-else if (as_cran)
-warningLog(Log, "'qpdf' is needed for checks on size reduction of PDFs")
-}
+if (R_check_doc_sizes && dir.exists("inst/doc")) check_doc_size()
 if (dir.exists("inst/doc") && do_install) check_doc_contents()
 if (dir.exists("vignettes")) check_vign_contents(ignore_vignettes)
 if (!ignore_vignettes) {
@@ -2129,12 +2125,18 @@ setRlibs <-
 
 check_doc_size <- function()
 {
-## Have already checked that inst/doc exists and qpdf can be found
+## Have already checked that inst/doc exists
 pdfs <- dir('inst/doc', pattern="\\.pdf",
 recursive = TRUE, full.names = TRUE)
 pdfs <- setdiff(pdfs, "inst/doc/Rplots.pdf")
 if (length(pdfs)) {
 checkingLog(Log, "sizes of PDF files under 'inst/doc'")
+if (!nzchar(Sys.which(Sys.getenv("R_QPDF", "qpdf" {
+if (as_cran)
+warningLog(Log, "'qpdf' is needed for checks on size reduction of PDFs")
+return()
+}
+
 any <- FALSE
 td <- tempfile('pdf')
 dir.create(td)
@@ -4424,8 +4426,7 @@ setRlibs <-
 	config_val_to_logical(Sys.getenv("_R_CHECK_PKG_SIZES_", "TRUE")) &&
 nzchar(Sys.which("du"))
 R_check_doc_sizes <-
-	config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES_", "TRUE")) &&
-nzchar(Sys.which(Sys.getenv("R_QPDF", "qpdf")))
+	config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES_", "TRUE"))
 R_check_doc_sizes2 <-
 	config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES2_", "FALSE"))
 R_check_code_assign_to_globalenv <-

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] static pdf vignette

2015-02-27 Thread Kirill Müller

Perhaps the R.rsp package by Henrik Bengtsson [1,2] is an option.


Cheers

Kirill


[1] http://cran.r-project.org/web/packages/R.rsp/index.html
[2] https://github.com/HenrikBengtsson/R.rsp


On 27.02.2015 02:44, Wang, Zhu wrote:

Dear all,

In my package I have a computational expensive Rnw file which can't pass R CMD 
check. Therefore I set eval=FALSE in the Rnw file. But I would like to have the 
pdf vignette generated by the Rnw file with eval=TRUE. It seems to me a static 
pdf vignette is an option.  Any suggestions on this?

Thanks,

Zhu Wang


**Connecticut Children's Confidentiality Notice**

This e-mail message, including any attachments, is for...{{dropped:6}}


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xtabs and NA

2015-02-13 Thread Kirill Müller

On 09.02.2015 16:59, Gabor Grothendieck wrote:

On Mon, Feb 9, 2015 at 8:52 AM, Kirill Müller
kirill.muel...@ivt.baug.ethz.ch wrote:
Passing table the output of model.frame would still allow the use of a 
formula interface:

mf - model.frame( ~ data, na.action = na.pass)
do.call(table, c(mf, useNA = ifany))

abc NA
1111


Fair enough, this qualifies as a workaround, and IMO this is how xtabs 
should handle it internally to allow writing xtabs(~data, na.action = 
na.pass) -- or at least xtabs(~data, na.action = na.pass, exclude = 
NULL) if backward compatibility is desired. Would anyone with write 
access to R's SVN repo care enough about this situation to review a 
patch? Thanks.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] xtabs and NA

2015-02-09 Thread Kirill Müller

Hi


I haven't found a way to produce a tabulation from factor data with NA 
values using xtabs. Please find a minimal example below, it's also on 
R-pubs [1]. Tested with R 3.1.2 and R-devel r67720.


It doesn't seem to be documented explicitly that it's not supported. 
From reading the code [2] it looks like the relevant call to table() 
doesn't set the useNA parameter, which I think is necessary to make 
NAs show up in the result.


Am I missing anything? If this a bug -- would a patch be welcome? Do we 
need compatibility with the current behavior?


I'm aware of workarounds, I just prefer xtabs() over table() for its 
interface.


Thanks.


Best regards

Kirill


[1] http://rpubs.com/krlmlr/xtabs-NA
[2] 
https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/library/stats/R/xtabs.R#L60



data - factor(letters[1:4], levels = letters[1:3])
data
## [1] abcNA
## Levels: a b c
xtabs(~data)
## data
## a b c
## 1 1 1
xtabs(~data, na.action = na.pass)
## data
## a b c
## 1 1 1
xtabs(~data, na.action = na.pass, exclude = numeric())
## data
## a b c
## 1 1 1
xtabs(~data, na.action = na.pass, exclude = NULL)
## data
## a b c
## 1 1 1
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
##  [3] LC_TIME=de_CH.UTF-8LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=de_CH.UTF-8LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=de_CH.UTF-8   LC_NAME=C
##  [9] LC_ADDRESS=C   LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics  grDevices utils datasets  methods base
##
## other attached packages:
## [1] magrittr_1.5ProjectTemplate_0.6-1.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.8evaluate_0.5.7  formatR_1.0.3 htmltools_0.2.6
## [5] knitr_1.9.2 rmarkdown_0.5.1 stringr_0.6.2 tools_3.1.2
## [9] ulimit_0.0-2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to test impact of candidate changes to package?

2014-09-10 Thread Kirill Müller
If you don't intend to keep the old business logic in the long run, 
perhaps a version control system such as Git can help you. If you use it 
in single-user mode, you can think of it as a backup system where you 
manually create each snapshot and give it a name, but it actually can do 
much more. For your use case, you can open a new *branch* where you 
implement your changes, and implement your testing logic simultaneously 
in both branches (using *merge* operations). The system handles 
switching between branches, so you can really perform invasive changes, 
and revert if you find that a particular change breaks something.


RStudio has Git support, but you probably need to use the shell to 
create a branch. On Windows or OS X the GitHub client helps you to get 
started.



Cheers

Kirill


On 09/10/2014 11:14 AM, Stephanie Locke wrote:

I have unit tests using testthat but these are typically of these types:
1) Check for correct calculation for a single set of valid inputs
2) Check for correct calculation for a larger set of valid inputs
3) Check for errors when providing incorrect inputs
4) Check for known frailties / past issues

This is more for where changes are needed to functions that apply various bits of 
business logic that can change over time, so there is no one answer. A unit 
test (at least as I understand it) can be worked through to make sure that given inputs, 
the output is computationally correct. What I'd like to do is overall the impact of a 
potential change by testing version 1 of a function in a package for a sample, then test 
version 2 of a function in a package for a sample and compare the results.

My difficulties encountered so far is I'm reluctantly to manually do this 
change invasively by overwriting the relevant files in the R directory, and 
then say using devtools to load it and test it with testthat as I risk 
producing incorrect states of my package and potentially releasing the wrong 
thing.  My preference would be a non-invasive method.  Currently, where I'm 
trying to do this non-invasively I source a new version of the function stored 
in a separate directory, but some of the functions dependent on it continue to 
reference to the package version of the functions, this means that when I'm 
doing test #2 I have to load lots more functions and hope I've caught them all 
(or do some sort of dependency hunting programmatically).

I may be missing something about testthat, but what I'm doing now seems to be 
nowhere near optimal and I'd love to have a better solution.

Cheers

Stephanie Locke
BI  Credit Risk Analyst

-Original Message-
From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be]
Sent: 10 September 2014 09:30
To: Stephanie Locke; r-devel@r-project.org
Subject: RE: How to test impact of candidate changes to package?

Dear Stephanie,

Have a look at the testthat package and the related article in the R Journal.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team 
Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance Kliniekstraat 
25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] 
Namens Stephanie Locke
Verzonden: woensdag 10 september 2014 9:55
Aan: r-devel@r-project.org
Onderwerp: [Rd] How to test impact of candidate changes to package?

I use a package to contain simple functions that can be handled by unit tests 
for correctness and more complex functions that combine the simple functions 
with business logic.  Where there are proposals to change either the simple 
functions or the business logic, a sample needs to be run before the change and 
then after it to understand the impact of the change.

I do this currently by
1. Using Rmarkdown documents
2. Loading the package as-is
3. Getting my sample
4. Running my sample through the package as-is and outputting table of results 
5. sourceing new copies of functions 6. Running my sample again and outputting 
table of results 7. Reloading package and sourceing different copies of 
functions as required

I really don't think this is a good way to do this as it risks missing 
downstream dependencies of the functions I'm trying to load into the global 
namespace to test.

Has anyone else had to do this sort of testing before on their packages? How 
did you do it? Am I missing an obvious package / framework that can do this?

Cheers,
Steph

--
Stephanie 

Re: [Rd] Request to review a patch for rpart

2014-08-15 Thread Kirill Müller

Gabriel


Thanks for your feedback. Indeed, I was not particularly clear here. The 
empty model is just a very special case in a more general setting. I'd 
have to work around this deficiency in my code -- sure I can do that, 
but I thought a generic solution should be possible. In particular, I'm 
using predict.rpart(..., type = prob) -- this just reflects the 
observed relative frequencies.



Cheers

Kirill


On 08/15/2014 06:44 PM, Gabriel Becker wrote:

Kirill,

Perhaps I'm just being obtuse, but what are you proposing rpart do in 
the case of an empty model?  Return a tree that always guesses the 
most common label, or doesn't guess at all (NA)? It doesn't seem like 
you'd need rpart for either of those.


~G


On Wed, Aug 13, 2014 at 3:51 AM, Kirill Müller 
kirill.muel...@ivt.baug.ethz.ch 
mailto:kirill.muel...@ivt.baug.ethz.ch wrote:


Dear list


For my work, it would be helpful if rpart worked seamlessly with
an empty model:

library(rpart); rpart(formula=y~0, data=data.frame(y=factor(1:10)))

Currently, an unrelated error (originating from na.rpart) is thrown.

At some point in the near future, I'd like to release a package to
CRAN which uses rpart and relies on that functionality. I have
prepared a patch (minor modifications at three places, and a test)
which I'd like to propose for inclusion in the next CRAN release
of rpart. The patch can be reviewed at
https://github.com/krlmlr/rpart/tree/empty-model, the files (based
on the current CRAN release 4.1-8) can be downloaded from
https://github.com/krlmlr/rpart/archive/empty-model.zip.

Thanks for your attention.


With kindest regards

Kirill Müller

__
R-devel@r-project.org mailto:R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Request to review a patch for rpart

2014-08-13 Thread Kirill Müller

Dear list


For my work, it would be helpful if rpart worked seamlessly with an 
empty model:


library(rpart); rpart(formula=y~0, data=data.frame(y=factor(1:10)))

Currently, an unrelated error (originating from na.rpart) is thrown.

At some point in the near future, I'd like to release a package to CRAN 
which uses rpart and relies on that functionality. I have prepared a 
patch (minor modifications at three places, and a test) which I'd like 
to propose for inclusion in the next CRAN release of rpart. The patch 
can be reviewed at https://github.com/krlmlr/rpart/tree/empty-model, the 
files (based on the current CRAN release 4.1-8) can be downloaded from 
https://github.com/krlmlr/rpart/archive/empty-model.zip.


Thanks for your attention.


With kindest regards

Kirill Müller

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] UTC time zone on Windows

2014-08-06 Thread Kirill Müller

Hi


I'm having trouble running R CMD build and check with UTC time zone 
setting in Windows Server 2012. I can't seem to get rid of the following 
warning:


  unable to identify current timezone 'C':
please set environment variable 'TZ'

However, setting TZ to either Europe/London or GMT Standard Time 
didn't help.


It seems to me that the warning originates in registryTZ.c 
(https://github.com/wch/r-source/blob/776708efe6003e36f02587ad47b2e19e2f69/src/extra/tzone/registryTZ.c#L363). 
I have therefore looked at 
HKLM\SYSTEM\CurrentControlSet\Control\TimeZoneInformation, to learn that 
TimeZoneKeyName is set to UTC. This time zone is not defined in 
TZtable, but is present in this machine's 
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones. (Also, the 
text of the warning permits the possibility that only the first 
character of the time zone is used for the warning message -- in the 
code, a const wchar_t* is used for a %s placeholder.)


Below is a link to the log of such a failing run. The first 124 lines 
are registry dumps, output of R CMD * is near the end of the log at 
lines 212 and 224.


https://ci.appveyor.com/project/krlmlr/r-appveyor/build/1.0.36

This happens with R 3.1.1 and R-devel r66309.

Is there a workaround I have missed, short of updating TZtable? How can 
I help updating TZtable? Thanks!



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs

2014-04-10 Thread Kirill Müller


On 03/26/2014 06:46 PM, Paul Gilbert wrote:



On 03/26/2014 04:58 AM, Kirill Müller wrote:

Dear list


It is possible to store expected output for tests and examples. From the
manual: If tests has a subdirectory Examples containing a file
pkg-Ex.Rout.save, this is compared to the output file for running the
examples when the latter are checked. And, earlier (written in the
context of test output, but apparently applies here as well): ...,
these two are compared, with differences being reported but not causing
an error.

I think a NOTE would be appropriate here, in order to be able to detect
this by only looking at the summary. Is there a reason for not flagging
differences here?


The problem is that differences occur too often because this is a 
comparison of characters in the output files (a diff). Any output that 
is affected by locale, node name or Internet downloads, time, host, or 
OS, is likely to cause a difference. Also, if you print results to a 
high precision you will get differences on different systems, 
depending on OS, 32 vs 64 bit, numerical libraries, etc. A better test 
strategy when it is numerical results that you want to compare is to 
do a numerical comparison and throw an error if the result is not 
good, something like


  r - result from your function
  rGood - known good value
  fuzz - 1e-12  #tolerance

  if (fuzz  max(abs(r - rGood))) stop('Test xxx failed.')

It is more work to set up, but the maintenance will be less, 
especially when you consider that your tests need to run on different 
OSes on CRAN.


You can also use try() and catch error codes if you want to check those.



Thanks for your input.

To me, this is a different kind of test, for which I'd rather use the 
facilities provided by the testthat package. Imagine a function that 
operates on, say, strings, vectors, or data frames, and that is expected 
to produce completely identical results on all platforms -- here, a 
character-by-character comparison of the output is appropriate, and I'd 
rather see a WARNING or ERROR if something fails.


Perhaps this functionality can be provided by external packages like 
roxygen and testthat: roxygen could create the good output (if asked 
for) and set up a testthat test that compares the example run with the 
good output. This would duplicate part of the work already done by 
base R; the duplication could be avoided if there was a way to specify 
the severity of a character-level difference between output and expected 
output, perhaps by means of an .Rout.cfg file in DCF format:


OnDifference: mute|note|warning|error
Normalize: [R expression]
Fuzziness: [number of different lines that are tolerated]

On that note: Is there a convenient way to create the .Rout.save files 
in base R? By convenient I mean a single function call, not checking 
and manually copying as suggested here: 
https://stat.ethz.ch/pipermail/r-help/2004-November/060310.html .



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs

2014-03-26 Thread Kirill Müller

Dear list


It is possible to store expected output for tests and examples. From the 
manual: If tests has a subdirectory Examples containing a file 
pkg-Ex.Rout.save, this is compared to the output file for running the 
examples when the latter are checked. And, earlier (written in the 
context of test output, but apparently applies here as well): ..., 
these two are compared, with differences being reported but not causing 
an error.


I think a NOTE would be appropriate here, in order to be able to detect 
this by only looking at the summary. Is there a reason for not flagging 
differences here?


The following is slightly related: Some compilers and static code 
analysis tools assign a numeric code to each type of error or warning 
they check for, and print it. Would that be possible to do for the 
anomalies detected by R CMD check? The most significant digit could 
denote the severity of the NOTE, WARNING or ERROR. This would further 
simplify (semi-)automated analysis of the output of R CMD check, e.g. in 
the context of automated tests.



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Docker versus Vagrant for reproducability - was: The case for freezing CRAN

2014-03-22 Thread Kirill Müller


On 03/22/2014 02:10 PM, Nathaniel Smith wrote:

On 22 Mar 2014 12:38, Philippe GROSJEAN philippe.grosj...@umons.ac.be
wrote:

On 21 Mar 2014, at 20:21, Gábor Csárdi csardi.ga...@gmail.com wrote:

In my opinion it is somewhat cumbersome to use this for everyday work,
although good virtualization software definitely helps.

Gabor


Additional info: you access R into the VM from within the host by ssh.

You can enable x11 forwarding there and you also got GUI stuff. It works
like a charm, but there are still some problems on my side when I try to
disconnect and reconnect to the same R process. I can solve this with, say,
screen. However, if any X11 window is displayed while I disconnect, R
crashes immediately on reconnection.

You might find the program 'xpra' useful. It's like screen, but for x11
programs.

-n
I second that. However, by default, xpra and GNU Screen are not aware of 
each other. To connect to xpra from within GNU Screen, you usually need 
to set the DISPLAY environment variable manually. I have described a 
solution that automates this, so that GUI applications just work from 
within GNU Screen and also survive a disconnect: 
http://krlmlr.github.io/integrating-xpra-with-screen/ .



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Deep copy of factor levels?

2014-03-17 Thread Kirill Müller

Hi


It seems that selecting an element of a factor will copy its levels 
(Ubuntu 13.04, R 3.0.2). Below is the output of a script that creates a 
factor with 1 elements and then calls as.list() on it. The new 
object seems to use more than 700 MB, and inspection of the levels of 
the individual elements of the list suggest that they are distinct objects.


Perhaps some performance gain could be achieved by copying the levels 
by reference, but I don't know R internals well enough to see if it's 
possible. Is there a particular reason for creating a full copy of the 
factor levels?


This has come up when looking at the performance of rbind.fill (in the 
plyr package) with factors: https://github.com/hadley/plyr/issues/206 .



Best regards

Kirill



 gc()
  used (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells  325977 17.51074393  57.4  10049951  536.8
Vcells 4617168 35.3   87439742 667.2 204862160 1563.0
 system.time(x - factor(seq_len(1e4)))
   user  system elapsed
  0.008   0.000   0.007
 system.time(xx - as.list(x))
   user  system elapsed
  4.263   0.000   4.322
 gc()
used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells385991  20.71074393  57.4  10049951  536.8
Vcells 104672187 798.6  112367694 857.3 204862160 1563.0
 .Internal(inspect(levels(xx[[1]])))
@387f620 16 STRSXP g1c7 [MARK,NAM(2)] (len=1, tl=0)
  @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 1
  @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 2
  @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 3
  @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 4
  @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 5
  ...
 .Internal(inspect(levels(xx[[2]])))
@1b38cb90 16 STRSXP g1c7 [MARK,NAM(2)] (len=1, tl=0)
  @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 1
  @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 2
  @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 3
  @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 4
  @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 5
  ...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Detect a terminated pipe

2014-03-14 Thread Kirill Müller

Hi

Is there a way to detect that the process that corresponds to a pipe has 
ended? On my system (Ubuntu 13.04), I see


 p - pipe(true, w); Sys.sleep(1); system(ps -elf | grep true | 
grep -v grep); isOpen(p)

[1] TRUE

The true process has long ended (as the filtered ps system call emits 
no output), still R believes that the pipe p is open.


Thanks for your input.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Detect a terminated pipe

2014-03-14 Thread Kirill Müller

On 03/14/2014 03:54 PM, Simon Urbanek wrote:

As far as R is concerned, the connection is open. In addition, pipes exist even 
without the process - you can close one end of a pipe and it will still exist 
(that’s what makes pipes useful, actually, because you can choose to close 
arbitrary combination of the R/W ends). Detecting that the other end of the 
pipe has closed is generally done by sending/receiving data to/from the end of 
interest - i.e. reading from a pipe that has closed the write end on the other 
side will yield 0 bytes read. Writing to a pipe that has closed the read end on 
the other side will yield SIGPIPE error (note that for text connections you 
have to call flush() to send the buffer):


p=pipe(true,r)
readLines(p)

character(0)

close(p)
p=pipe(true,w)
writeLines(, p)
flush(p)

Error in flush.connection(p) : ignoring SIGPIPE signal

close(p)
Thanks for your reply. I tried this in an R console and received the 
error, just like you described. Unfortunately, the error is not thrown 
when trying the same in RStudio. Any ideas?



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] $new cannot be accessed when running from Rscript and methods package is not loaded

2014-02-10 Thread Kirill Müller

Hi


Accesses the $new method for a class defined in a package fails if the 
methods package is not loaded. I have created a test package with the 
following single code file:


newTest - function() {
  cl - get(someClass)
  cl$new
}

someClass - setRefClass(someClass)

(This is similar to code actually used in the testthat package.)

If methods is not loaded, executing the newTest function fails in the 
following scenarios:


- Package depends on methods (scenario depend)
- Package imports methods and imports either the setRefClass function 
(scenario import-setRefClass) or the whole package (scenario 
import-methods)


It succeeds if the newTest function calls require(methods) (scenario 
require).


The script at 
https://raw2.github.com/krlmlr/methodsTest/master/test-all.sh creates an 
empty user library in subdirectory r-lib of the current directory, 
installs devtools, and tests the four scenarios by repeatedly installing 
the corresponding version of the package and trying to execute newTest() 
from Rscript. I have attached the output. The package itself is on 
GitHub: https://github.com/krlmlr/methodsTest , there is a branch for 
each scenario.


Why does it seem to be necessary to load the methods package here?


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] $new cannot be accessed when running from Rscript and methods package is not loaded

2014-02-10 Thread Kirill Müller

On 02/11/2014 03:22 AM, Peter Meilstrup wrote:

Because depends is treated incorrectly (if I may place a value
judgement on it). I had an earlier thread on this, not sure if any
changes have taken place since then:

http://r.789695.n4.nabble.com/Dependencies-of-Imports-not-attached-td4666529.html

Peter


Thanks. Could you please clarify: The thread you mention refers to a 
scenario where a package uses another package that depends on methods. 
The issue I'm describing doesn't have this, there is only a single 
package that tries to use $new and fails. ?


On that note: A related discussion on R-devel advises depending on 
methods, but this doesn't seem to be enough in this case:


http://r.789695.n4.nabble.com/advise-on-Depends-td4678930.html


-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] file.exists does not like path names ending in /

2014-01-17 Thread Kirill Müller

On 01/17/2014 02:56 PM, Gabor Grothendieck wrote:

At the moment I am using this to avoid the
problem:

File.exists - function(x) {
if (.Platform$OS == windows  grepl([/\\]$, x)) {
file.exists(dirname(x))
} else file.exists(x)
}

but it would be nice if that could be done by file.exists itself.
I think that ignoring a terminal slash/backslash on Windows would do no 
harm: It would improve consistency between platforms, and perhaps nobody 
really relies on the current behavior. Would shorten the documentation, too.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] file.exists does not like path names ending in /

2014-01-17 Thread Kirill Müller

On 01/17/2014 07:35 PM, William Dunlap wrote:

I think that ignoring a terminal slash/backslash on Windows would do no
harm:

Windows makes a distinction between C: and C:/: the former is
not a file (or directory) and the latter is.
But, according to the documentation, neither would be currently detected 
by file.exists, while the latter is a directory, as you said, and should 
be detected as such.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in tex mode

2014-01-03 Thread Kirill Müller

On 01/03/2014 02:34 AM, Duncan Murdoch wrote:

Carriage returns usually don't matter in LaTeX
I'd rather say they do. One is like a space, two or more end a paragraph 
and start a new one. If newlines are stripped away, the meaning of the 
TeX code can change, in some cases dramatically (e.g. if comments are 
written to the TeX code).


Also, I don't understand why the option is called strip.white, at least 
for results=tex. The docs say that blank lines at the beginning and end 
of output are removed, but the observed behavior is to remove the 
terminating carriage return of the output.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in tex mode

2014-01-03 Thread Kirill Müller
I'm sorry, I didn't mean to be rude. Do you prefer including the entire 
original message when replying? Or perhaps I misunderstood you when you 
wrote:


 Carriage returns usually don't matter in LaTeX, so I didn't even know 
about this option, though I use results=tex quite often. I had to look 
at the source to see where the newlines were going, and saw it there.


Could you please clarify? Thanks.


-Kirill


On 01/03/2014 11:39 AM, Duncan Murdoch wrote:

It's dishonest to quote me out of context.

Duncan Murdoch

On 14-01-03 3:40 AM, Kirill Müller wrote:

On 01/03/2014 02:34 AM, Duncan Murdoch wrote:

Carriage returns usually don't matter in LaTeX

I'd rather say they do. One is like a space, two or more end a paragraph
and start a new one. If newlines are stripped away, the meaning of the
TeX code can change, in some cases dramatically (e.g. if comments are
written to the TeX code).

Also, I don't understand why the option is called strip.white, at least
for results=tex. The docs say that blank lines at the beginning and end
of output are removed, but the observed behavior is to remove the
terminating carriage return of the output.


-Kirill





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in tex mode

2014-01-03 Thread Kirill Müller

On 01/03/2014 01:06 PM, Duncan Murdoch wrote:

On 14-01-03 5:47 AM, Kirill Müller wrote:

I'm sorry, I didn't mean to be rude. Do you prefer including the entire
original message when replying? Or perhaps I misunderstood you when you
wrote:


You don't need to include irrelevant material in your reply, but you 
should include explanatory material when you are arguing about a 
particular claim.  If you aren't sure whether it is relevant or not, 
then you should probably ask for clarification rather than arguing 
with the claim.


Thanks. In the future, I'll quote at least full sentences and everything 
they refer to, to avoid confusion and make sure that context is maintained.


   Carriage returns usually don't matter in LaTeX, so I didn't even 
know

about this option, though I use results=tex quite often. I had to look
at the source to see where the newlines were going, and saw it there.

Could you please clarify? Thanks.


Single carriage returns are usually equivalent to spaces. Multiple 
carriage returns separate paragraphs, but they are rare in code chunk 
output in my Sweave usage.  I normally put plain text in the LaTeX 
part of the Sweave document.


Indeed, it only makes a difference for code that generates large 
portions of LaTeX (such as tikzDevice).
I have checked my own .Rnw files, and I have used results=tex about 
600 times, but never used strip.white.


I've also looked at the .Rnw files in CRAN packages, and 
strip.white=true and strip.white=all are used there about 140 times, 
but strip.white=false is only used 10 times.  I think only one package 
(SweaveListingUtils) uses strip.white=false in combination with 
results=tex.


So while I agree Martin's adaptive option would have been a better 
default than true, I think it would be more likely to cause trouble 
than to solve it.


I agree, given this data and considering that trimming the terminal 
newline can be considered a feature. Perhaps comments are the only use 
case where the newline is really important. But then I don't see how to 
reliably detect comments, as the catcode for % can be changed, e.g., in 
a verbatim environment. I'll consider printing a \relax after the 
comment in tikzDevice, this should be robust and sufficient.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Sweave trims console output in tex mode

2014-01-02 Thread Kirill Müller

Hi


In the example .Rnw file below, only the newline between c and d is 
visible in the resulting .tex file after running R CMD Sweave. What is 
the reason for this behavior? Newlines are important in LaTeX and should 
be preserved. In particular, this behavior leads to incorrect LaTeX code 
generated when using tikz(console=TRUE) inside a Sweave chunk, as shown 
in the tikzDevice vignette.


A similar question has been left unanswered before: 
https://stat.ethz.ch/pipermail/r-help/2010-June/242019.html . I am well 
aware of knitr, I'm looking for a solution for Sweave.



Cheers

Kirill


\documentclass{article}
\begin{document}
inline,echo=FALSE,results=tex=
cat(a\n)
cat(b\n \n)
cat(c\nd)
@
\end{document}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in tex mode

2014-01-02 Thread Kirill Müller

On 01/03/2014 01:45 AM, Duncan Murdoch wrote:
You are running with the strip.white option set to TRUE.  That strips 
blank lines at then beginning and end of each output piece.  Just set 
strip.white=FALSE.
Thanks, the code below works perfectly. I have also found the 
documentation in ?RweaveLatex .


I'm not sure if the default setting is sensible for results=tex, 
though. Has this changed in the recent past?



-Kirill


\documentclass{article}
\begin{document}
inline,echo=FALSE,results=tex,strip.white=FALSE=
cat(a\n)
cat(b\n \n)
cat(c\nd)
@
\end{document}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in tex mode

2014-01-02 Thread Kirill Müller

On 01/03/2014 01:59 AM, Duncan Murdoch wrote:
But results=tex is not the default.  Having defaults for one option 
depend on the setting for another is confusing, so I think the current 
setting is appropriate. 
True. On the other hand, I cannot imagine that results=tex is useful 
at all without strip.white=FALSE. If the strip.white option would 
auto-adjust, things would just work. Anyway, I'm not a very active 
user of Sweave.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller

Gabor

I agree with you. There's Travis CI, and r-travis -- an attempt to 
integrate R package testing with Travis. Pushing back to GitHub is 
possible, but the setup is somewhat difficult. Also, this can be subject 
to race conditions because each push triggers a test run and they can 
happen in parallel even for the same repository. How do you handle branches?


It would be really great to be able to execute custom R code before 
building. Perhaps in a PreBuild: section in DESCRIPTION?



Cheers

Kirill



On 12/12/2013 02:21 AM, Gábor Csárdi wrote:

Hi,

this is maybe mostly a personal preference, but I prefer not to put
generated files in the vc repository. Changes in the generated files,
especially if there is many of them, pollute the diffs and make them
less useful.

If you really want to be able to install the package directly from
github, one solution is to
1. create another repository, that contains the complete generated
package, so that install_github() can install it.
2. set up a CI service, that can download the package from github,
build the package or the generated files (check the package, while it
is at it), and then push the build stuff back to github.
3. set up a hook on github, that invokes the CI after each commit.

I have used this setup in various projects with jenkins-ci and it
works well. Diffs are clean, the package is checked and built
frequently, and people can download it without having to install the
tools that generate the generated files.

The only downside is that you need to install a CI, so you need a
server for that. Maybe you can do this with travis-ci, maybe not, I
am not familiar with it that much.

Best,
Gabor

On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller
kirill.muel...@ivt.baug.ethz.ch wrote:

Hi

Quite a few R packages are now available on GitHub long before they appear
on CRAN, installation is simple thanks to devtools::install_github().
However, it seems to be common practice to keep the .Rd files (and NAMESPACE
and the Collate section in the DESCRIPTION) in the Git tree, and to manually
update it, even if they are autogenerated from the R code by roxygen2. This
requires extra work for each update of the documentation and also binds
package development to a specific version of roxygen2 (because otherwise
lots of bogus changes can be added by roxygenizing with a different
version).

What options are there to generate the .Rd files during build/install? In
https://github.com/hadley/devtools/issues/43 the issue has been discussed,
perhaps it can be summarized as follows:

- The devtools package is not the right place to implement
roxygenize-before-build
- A continuous integration service would be better for that, but currently
there's nothing that would be easy to use
- Roxygenizing via src/Makefile could work but requires further
investigation and an installation of Rtools/xcode on Windows/OS X

Especially the last point looks interesting to me, but since this is not
widely used there must be pitfalls I'm not aware of. The general idea would
be:

- Place code that builds/updates the .Rd and NAMESPACE files into
src/Makefile
- Users installing the package from source will require infrastructure
(Rtools/make)
- For binary packages, the .Rd files are already generated and added to the
.tar.gz during R CMD build before they are submitted to CRAN/WinBuilder, and
they are also generated (in theory) by R CMD build --binary

I'd like to hear your opinion on that. I have also found a thread on package
development workflow
(https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but
there was nothing on un-versioning .Rd files.


Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
_
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone:   +41 44 633 33 17
Fax: +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail:  kirill.muel...@ivt.baug.ethz.ch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller
On 12/13/2013 12:50 PM, Romain Francois wrote:
 Pushing back to github is not so difficult. See e.g 
 http://blog.r-enthusiasts.com/2013/12/04/automated-blogging.html
Thanks for the writeup, I'll try this. Perhaps it's better to push the 
results of `R CMD build`, though.
 You can manage branches easily in travis. You could for example decide 
 to do something different if you are on the master branch ...
That's right. But then no .Rd files are built when I'm on a branch, so I 
can't easily preview the result.

The ideal situation would be:

1. I manage only R source files on GitHub, not Rd files, NAMESPACE nor 
the Collate section of DESCRIPTION. Machine-readable instructions on 
how to build those are provided with the package.
2. Anyone can install from GitHub using devtools::install_github(). This 
also should work for branches, forks and pull requests.
3. I can build the package so that the result can be accepted by CRAN.

The crucial point on that list is point 2, the others I can easily solve 
myself.

The way I see it, point 2 can be tackled by extending devtools or 
extending the ways packages are built. Extending devtools seems to be 
the inferior approach, although, to be honest, I'd be fine with that as 
well.


-Kirill


 Romain

 Le 13 déc. 2013 à 12:03, Kirill Müller 
 kirill.muel...@ivt.baug.ethz.ch 
 mailto:kirill.muel...@ivt.baug.ethz.ch a écrit :

 Gabor

 I agree with you. There's Travis CI, and r-travis -- an attempt to 
 integrate R package testing with Travis. Pushing back to GitHub is 
 possible, but the setup is somewhat difficult. Also, this can be 
 subject to race conditions because each push triggers a test run and 
 they can happen in parallel even for the same repository. How do you 
 handle branches?

 It would be really great to be able to execute custom R code before 
 building. Perhaps in a PreBuild: section in DESCRIPTION?


 Cheers

 Kirill


 On 12/12/2013 02:21 AM, Gábor Csárdi wrote:
 Hi,

 this is maybe mostly a personal preference, but I prefer not to put
 generated files in the vc repository. Changes in the generated files,
 especially if there is many of them, pollute the diffs and make them
 less useful.

 If you really want to be able to install the package directly from
 github, one solution is to
 1. create another repository, that contains the complete generated
 package, so that install_github() can install it.
 2. set up a CI service, that can download the package from github,
 build the package or the generated files (check the package, while it
 is at it), and then push the build stuff back to github.
 3. set up a hook on github, that invokes the CI after each commit.

 I have used this setup in various projects with jenkins-ci and it
 works well. Diffs are clean, the package is checked and built
 frequently, and people can download it without having to install the
 tools that generate the generated files.

 The only downside is that you need to install a CI, so you need a
 server for that. Maybe you can do this with travis-ci, maybe not, I
 am not familiar with it that much.

 Best,
 Gabor

 On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller
 kirill.muel...@ivt.baug.ethz.ch 
 mailto:kirill.muel...@ivt.baug.ethz.ch wrote:
 Hi

 Quite a few R packages are now available on GitHub long before they 
 appear
 on CRAN, installation is simple thanks to devtools::install_github().
 However, it seems to be common practice to keep the .Rd files (and 
 NAMESPACE
 and the Collate section in the DESCRIPTION) in the Git tree, and to 
 manually
 update it, even if they are autogenerated from the R code by 
 roxygen2. This
 requires extra work for each update of the documentation and also binds
 package development to a specific version of roxygen2 (because 
 otherwise
 lots of bogus changes can be added by roxygenizing with a different
 version).

 What options are there to generate the .Rd files during 
 build/install? In
 https://github.com/hadley/devtools/issues/43 the issue has been 
 discussed,
 perhaps it can be summarized as follows:

 - The devtools package is not the right place to implement
 roxygenize-before-build
 - A continuous integration service would be better for that, but 
 currently
 there's nothing that would be easy to use
 - Roxygenizing via src/Makefile could work but requires further
 investigation and an installation of Rtools/xcode on Windows/OS X

 Especially the last point looks interesting to me, but since this 
 is not
 widely used there must be pitfalls I'm not aware of. The general 
 idea would
 be:

 - Place code that builds/updates the .Rd and NAMESPACE files into
 src/Makefile
 - Users installing the package from source will require infrastructure
 (Rtools/make)
 - For binary packages, the .Rd files are already generated and 
 added to the
 .tar.gz during R CMD build before they are submitted to 
 CRAN/WinBuilder, and
 they are also generated (in theory) by R CMD build --binary

 I'd like to hear your opinion on that. I have also 

Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller

Thanks a lot. This would indeed solve the problem. I'll try mkdist today ;-)

Is the NEWS file parsed before of after mkdist has been executed?

Would you be willing to share the code for the infrastructure, perhaps 
on GitHub?



-Kirill


On 12/13/2013 09:14 PM, Simon Urbanek wrote:

FWIW this is essentially what RForge.net provides. Each GitHub commit triggers a build 
(branches are supported as the branch info is passed in the WebHook) which can be either 
classic R CMD build or a custom shell script (hence you can do anything you 
want). The result is a tar ball (which includes the generated files) and that tar ball 
gets published in the R package repository. R CMD  check is run as well on the tar ball 
and the results are published.
This way you don't need devtools, users can simply use install.packages() 
without requiring any additional tools.

There are some talks about providing the above as a cloud service, so that 
anyone can run and/or use it.

Cheers,
Simon


On Dec 13, 2013, at 8:51 AM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch 
wrote:


On 12/13/2013 12:50 PM, Romain Francois wrote:

Pushing back to github is not so difficult. See e.g
http://blog.r-enthusiasts.com/2013/12/04/automated-blogging.html

Thanks for the writeup, I'll try this. Perhaps it's better to push the
results of `R CMD build`, though.

You can manage branches easily in travis. You could for example decide
to do something different if you are on the master branch ...

That's right. But then no .Rd files are built when I'm on a branch, so I
can't easily preview the result.

The ideal situation would be:

1. I manage only R source files on GitHub, not Rd files, NAMESPACE nor
the Collate section of DESCRIPTION. Machine-readable instructions on
how to build those are provided with the package.
2. Anyone can install from GitHub using devtools::install_github(). This
also should work for branches, forks and pull requests.
3. I can build the package so that the result can be accepted by CRAN.

The crucial point on that list is point 2, the others I can easily solve
myself.

The way I see it, point 2 can be tackled by extending devtools or
extending the ways packages are built. Extending devtools seems to be
the inferior approach, although, to be honest, I'd be fine with that as
well.


-Kirill


Romain

Le 13 déc. 2013 à 12:03, Kirill Müller
kirill.muel...@ivt.baug.ethz.ch
mailto:kirill.muel...@ivt.baug.ethz.ch a écrit :


Gabor

I agree with you. There's Travis CI, and r-travis -- an attempt to
integrate R package testing with Travis. Pushing back to GitHub is
possible, but the setup is somewhat difficult. Also, this can be
subject to race conditions because each push triggers a test run and
they can happen in parallel even for the same repository. How do you
handle branches?

It would be really great to be able to execute custom R code before
building. Perhaps in a PreBuild: section in DESCRIPTION?


Cheers

Kirill


On 12/12/2013 02:21 AM, Gábor Csárdi wrote:

Hi,

this is maybe mostly a personal preference, but I prefer not to put
generated files in the vc repository. Changes in the generated files,
especially if there is many of them, pollute the diffs and make them
less useful.

If you really want to be able to install the package directly from
github, one solution is to
1. create another repository, that contains the complete generated
package, so that install_github() can install it.
2. set up a CI service, that can download the package from github,
build the package or the generated files (check the package, while it
is at it), and then push the build stuff back to github.
3. set up a hook on github, that invokes the CI after each commit.

I have used this setup in various projects with jenkins-ci and it
works well. Diffs are clean, the package is checked and built
frequently, and people can download it without having to install the
tools that generate the generated files.

The only downside is that you need to install a CI, so you need a
server for that. Maybe you can do this with travis-ci, maybe not, I
am not familiar with it that much.

Best,
Gabor

On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller
kirill.muel...@ivt.baug.ethz.ch
mailto:kirill.muel...@ivt.baug.ethz.ch wrote:

Hi

Quite a few R packages are now available on GitHub long before they
appear
on CRAN, installation is simple thanks to devtools::install_github().
However, it seems to be common practice to keep the .Rd files (and
NAMESPACE
and the Collate section in the DESCRIPTION) in the Git tree, and to
manually
update it, even if they are autogenerated from the R code by
roxygen2. This
requires extra work for each update of the documentation and also binds
package development to a specific version of roxygen2 (because
otherwise
lots of bogus changes can be added by roxygenizing with a different
version).

What options are there to generate the .Rd files during
build/install? In
https://github.com/hadley/devtools/issues/43 the issue has been
discussed

Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller

On 12/13/2013 06:09 PM, Brian Diggs wrote:
One downside I can see with this third approach is that by making the 
package documentation generation part of the build process, you must 
then make the package depend/require roxygen (or whatever tools you 
are using to generate documentation). This dependence, though, is just 
to build the package, not to actually use the package. And by pushing 
this dependency onto the end users of the package, you have 
transferred the problem you mentioned (... and also binds package 
development to a specific version of roxygen2 ...) to the many end 
users rather than the few developers.
That's right. As outlined in another message, roxygen2 would be required 
for building from the raw source (hosted on GitHub) but not for 
installing from a source tarball (which would contain the .Rd files). 
Not sure if that's possible, though.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-11 Thread Kirill Müller

Hi

Quite a few R packages are now available on GitHub long before they 
appear on CRAN, installation is simple thanks to 
devtools::install_github(). However, it seems to be common practice to 
keep the .Rd files (and NAMESPACE and the Collate section in the 
DESCRIPTION) in the Git tree, and to manually update it, even if they 
are autogenerated from the R code by roxygen2. This requires extra work 
for each update of the documentation and also binds package development 
to a specific version of roxygen2 (because otherwise lots of bogus 
changes can be added by roxygenizing with a different version).


What options are there to generate the .Rd files during build/install? 
In https://github.com/hadley/devtools/issues/43 the issue has been 
discussed, perhaps it can be summarized as follows:


- The devtools package is not the right place to implement 
roxygenize-before-build
- A continuous integration service would be better for that, but 
currently there's nothing that would be easy to use
- Roxygenizing via src/Makefile could work but requires further 
investigation and an installation of Rtools/xcode on Windows/OS X


Especially the last point looks interesting to me, but since this is not 
widely used there must be pitfalls I'm not aware of. The general idea 
would be:


- Place code that builds/updates the .Rd and NAMESPACE files into 
src/Makefile
- Users installing the package from source will require infrastructure 
(Rtools/make)
- For binary packages, the .Rd files are already generated and added to 
the .tar.gz during R CMD build before they are submitted to 
CRAN/WinBuilder, and they are also generated (in theory) by R CMD build 
--binary


I'd like to hear your opinion on that. I have also found a thread on 
package development workflow 
(https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but 
there was nothing on un-versioning .Rd files.



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Trouble running Rtools31 on Wine

2013-11-15 Thread Kirill Müller
Hi

An attempt to use R and Rtools in Wine fails, see the bug report to Wine:

http://bugs.winehq.org/show_bug.cgi?id=34865

The people there say that Rtools uses an outdated Cygwin DLL with a 
custom patch. Is there any chance we can upgrade our Cygwin DLL to a 
supported upstream version? Thanks.


Cheers

Kirill

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel