[Rd] memory allocation if multiple names point to the same object

2014-06-02 Thread Dénes Tóth


Hi,

Please consider the following code:

a - seq.int(10)  # create a
tracemem(a)
a[1:4] - 4:1   # no internal copy
b - a  # no internal copy
b[1:4] - 1:4   # copy, b is not a any more
a[1:4] - 1:4   # copy, but why?

With results:
 a - seq.int(10)
 tracemem(a)
[1] 0x1792bc0
 a[1:4] - 4:1
 b - a
 b[1:4] - 1:4
tracemem[0x1792bc0 - 0x1792b58]:
 a[1:4] - 1:4
tracemem[0x1792bc0 - 0x1792af0]:


##

Could you provide a brief explanation or point me to a source why R 
needs a copy in the final step?



Best,
  Denes



 sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] memory allocation if multiple names point to the same object

2014-06-03 Thread Dénes Tóth

Dear Michael,

thank you for the enlightenment.  Just for the records, here is the 
solution that Michael referred to: 
http://developer.r-project.org/Refcnt.html


Best,
  Denes


On 06/03/2014 03:57 PM, Michael Lawrence wrote:

This is because R keeps track of the names of an object, until there are
2 names. Thus, once it reaches 2, it can no longer decrement the named
count. In this example, 'a' reaches 2 names ('a' and 'b'), thus R does
not know that 'a' only has one name at the end.

Luke has added reference counting to R 3.1 to get around these types of
problems. If you want to try it out, make the necessary change in
Rinternals.h and recompile. With reference counting, R knows that 'a'
only has one reference and avoids the copy.

  a - seq.int http://seq.int(10)
  a[1:4] - 4:1
  b - a
  .Internal(inspect(a))
@2b71608 13 INTSXP g0c4 [REF(2)] (len=10, tl=0) 4,3,2,1,5,...
  b[1:4] - 1:4
  .Internal(inspect(b))
@2b715a0 13 INTSXP g0c4 [REF(1)] (len=10, tl=0) 1,2,3,4,5,...
  .Internal(inspect(a))
@2b71608 13 INTSXP g0c4 [REF(1)] (len=10, tl=0) 4,3,2,1,5,...
  a[1:4] - 1:4
  .Internal(inspect(a))
@2b71608 13 INTSXP g0c4 [REF(1)] (len=10, tl=0) 1,2,3,4,5,...



On Mon, Jun 2, 2014 at 9:31 AM, Dénes Tóth toth.de...@ttk.mta.hu
mailto:toth.de...@ttk.mta.hu wrote:


Hi,

Please consider the following code:

a - seq.int http://seq.int(10)  # create a
tracemem(a)
a[1:4] - 4:1   # no internal copy
b - a  # no internal copy
b[1:4] - 1:4   # copy, b is not a any more
a[1:4] - 1:4   # copy, but why?

With results:
  a - seq.int http://seq.int(10)
  tracemem(a)
[1] 0x1792bc0
  a[1:4] - 4:1
  b - a
  b[1:4] - 1:4
tracemem[0x1792bc0 - 0x1792b58]:
  a[1:4] - 1:4
tracemem[0x1792bc0 - 0x1792af0]:


##

Could you provide a brief explanation or point me to a source why R
needs a copy in the final step?


Best,
   Denes



  sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


R-devel@r-project.org mailto:R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/r-devel
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request to speed up save()

2015-01-15 Thread Dénes Tóth



On 01/15/2015 01:45 PM, Stewart Morris wrote:

Hi,

I am dealing with very large datasets and it takes a long time to save a
workspace image.

The options to save compressed data are: gzip, bzip2 or xz, the
default being gzip. I wonder if it's possible to include the pbzip2
(http://compression.ca/pbzip2/) algorithm as an option when saving.

PBZIP2 is a parallel implementation of the bzip2 block-sorting file
compressor that uses pthreads and achieves near-linear speedup on SMP
machines. The output of this version is fully compatible with bzip2
v1.0.2 or newer

I tested this as follows with one of my smaller datasets, having only
read in the raw data:


# Dumped an ascii image
save.image(file='test', ascii=TRUE)

# At the shell prompt:
ls -l test
-rw-rw-r--. 1 swmorris swmorris 1794473126 Jan 14 17:33 test

time bzip2 -9 test
364.702u 3.148s 6:14.01 98.3%0+0k 48+1273976io 1pf+0w

time pbzip2 -9 test
422.080u 18.708s 0:11.49 3836.2%0+0k 0+1274176io 0pf+0w


As you can see, bzip2 on its own took over 6 minutes whereas pbzip2 took
11 seconds, admittedly on a 64 core machine (running at 50% load). Most
modern machines are multicore so everyone would get some speedup.

Is this feasible/practical? I am not a developer so I'm afraid this
would be down to someone else...


Take a look at the gdsfmt package. It supports the superfast Lz4 
compression algorithm + it provides highly optimized functions to write 
to/read from disk.

https://github.com/zhengxwen/gdsfmt



Thoughts?

Cheers,

Stewart



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel does not update the C++ returned variables

2015-03-02 Thread Dénes Tóth



On 03/02/2015 04:37 PM, Martin Maechler wrote:



On 2 March 2015 at 09:09, Duncan Murdoch wrote:
| I generally recommend that people use Rcpp, which hides a lot of the
| details.  It will generate your .Call calls for you, and generate the
| C++ code that receives them; you just need to think about the real
| problem, not the interface.  It has its own learning curve, but I think
| it is easier than using the low-level code that you need to work with .Call.



Thanks for that vote, and I second that.



And these days the learning is a lot flatter than it was a decade ago:



R Rcpp::cppFunction(NumericVector doubleThis(NumericVector x) { return(2*x); 
})
R doubleThis(c(1,2,3,21,-4))
[1]  2  4  6 42 -8
R



That defined, compiled, loaded and run/illustrated a simple function.



Dirk


Indeed impressive,  ... and it also works with integer vectors
something also not 100% trivial when working with compiled code.

When testing that, I've went a step further:

## now test:
require(microbenchmark)
i - 1:10


Note that the relative speed of the algorithms also depends on the size 
of the input vector. i + i becomes the winner for longer vectors (e.g. i 
- 1:1e6), but a proper Rcpp version is still approximately twice as fast.


Rcpp::cppFunction(NumericVector doubleThisNum(NumericVector x) { 
return(2*x); })
Rcpp::cppFunction(IntegerVector doubleThisInt(IntegerVector x) { 
return(2*x); })

i - 1:1e6
mb - microbenchmark::microbenchmark(doubleThisNum(i), doubleThisInt(i), 
i*2, 2*i, i*2L, 2L*i, i+i, times=100)

plot(mb, log=y, notch=TRUE)



(mb - microbenchmark(doubleThis(i), i*2, 2*i, i*2L, 2L*i, i+i, times=2^12))
## Lynne (i7; FC 20), R Under development ... (2015-03-02 r67924):
## Unit: nanoseconds
##   expr min  lq  mean median   uq   max neval cld
##  doubleThis(i) 762 985 1319.5974   1124 1338 17831  4096   b
##  i * 2 124 151  258.4419164  221 4  4096  a
##  2 * i 127 154  266.4707169  216 20213  4096  a
## i * 2L 143 164  250.6057181  234 16863  4096  a
## 2L * i 144 177  269.5015193  237 16119  4096  a
##  i + i 152 183  272.6179199  243 10434  4096  a

plot(mb, log=y, notch=TRUE)
## hmm, looks like even the simple arithm. differ slightly ...
##
## == zoom in:
plot(mb, log=y, notch=TRUE, ylim = c(150,300))

dev.copy(png, file=mbenchm-doubling.png)
dev.off() # [ - why do I need this here for png ??? ]
##-- see the appended *png graphic

Those who've learnt EDA or otherwise about boxplot notches, will
know that they provide somewhat informal but robust pairwise tests on
approximate 5% level.
 From these, one *could* - possibly wrongly - conclude that
'i * 2' is significantly faster than both 'i * 2L' and also
'i + i'  which I find astonishing, given that  i is integer here...

Probably no reason for deep thoughts here, but if someone is
enticed, this maybe slightly interesting to read.

Martin Maechler, ETH Zurich



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] possible bug in utils::removeSource - NULL argument is silently dropped

2017-12-11 Thread Dénes Tóth

Dear R-Core Team,

I found an unexpected behaviour in utils::removeSource (also present in 
r-devel as of today).


---

# create a function which accepts NULL argument
foo <- function(x, y) {
  if (is.null(y)) y <- "default foo"
  attr(x, "foo") <- y
  x
}

# create a function which utilizes 'foo'
testSrc <- function() {
  x <- 1:3
  x <- foo(x, NULL)
  x
}

# this works fine
testSrc()

# this fails
testNoSrc <- utils::removeSource(testSrc)
testNoSrc()

# removeSource removes NULL from the 'foo' call
print(testNoSrc)

---

I traced back the bug to this row in removeSource:

(line 33 in sourceutils.R)
part[[i]] <- recurse(part[[i]])

it should be (IMHO):
part[i] <- list(recurse(part[[i]]))

---

# create a function with the above patch
rmSource <- function (fn) {
  stopifnot(is.function(fn))
  if (is.primitive(fn))
return(fn)
  attr(fn, "srcref") <- NULL
  attr(body(fn), "wholeSrcref") <- NULL
  attr(body(fn), "srcfile") <- NULL
  recurse <- function(part) {
if (is.name(part))
  return(part)
attr(part, "srcref") <- NULL
attr(part, "wholeSrcref") <- NULL
attr(part, "srcfile") <- NULL
if (is.language(part) && is.recursive(part)) {
  for (i in seq_along(part))
part[i] <- list(recurse(part[[i]]))
}
part
  }
  body(fn) <- recurse(body(fn))
  fn
}

# test
( testNoSrc2 <- rmSource(testSrc) )
testNoSrc2()


Regards,
Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] length of `...`

2018-05-03 Thread Dénes Tóth

Hi,


In some cases the number of arguments passed as ... must be determined 
inside a function, without evaluating the arguments themselves. I use 
the following construct:


dotlength <- function(...) length(substitute(expression(...))) - 1L

# Usage (returns 3):
dotlength(1, 4, something = undefined)

How can I define a method for length() which could be called directly on 
`...`? Or is it an intention to extend the base length() function to 
accept ellipses?



Regards,
Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-09-01 Thread Dénes Tóth
The solution below introduces a dependency on data.table, but otherwise 
it does what you need:


---

# special method for Foo objects
length.Foo <- function(x) {
  length(unlist(x, recursive = TRUE, use.names = FALSE))
}

# an instance of a Foo object
x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")

# its length
stopifnot(length(x) == 3L)

# get its length as if it were a standard list
.length <- function(x) {
  cls <- class(x)
  # setattr() does not make a copy, but modifies by reference
  data.table::setattr(x, "class", NULL)
  # get the length
  len <- base::length(x)
  # re-set original classes
  data.table::setattr(x, "class", cls)
  # return the unclassed length
  len
}

# to check that we do not make unwanted changes
orig_class <- class(x)

# check that the address in RAM does not change
a1 <- data.table::address(x)

# 'unclassed' length
stopifnot(.length(x) == 2L)

# check that address is the same
stopifnot(a1 == data.table::address(x))

# check against original class
stopifnot(identical(orig_class, class(x)))

---


On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:

Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:

.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Thxs,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Dénes Tóth




On 08/30/2018 01:56 PM, Joris Meys wrote:

I have to agree with Emil here. && and || are short circuited like in C and
C++. That means that

TRUE || c(TRUE, FALSE)
FALSE && c(TRUE, FALSE)

cannot give an error because the second part is never evaluated. Throwing a
warning or error for

c(TRUE, FALSE) || TRUE

would mean that the operator gives a different result depending on the
order of the objects, breaking the symmetry. Also that would be undesirable.


Note that `||` and `&&` have never been symmetric:

TRUE || stop() # returns TRUE
stop() || TRUE # returns an error




Regarding logical(0): per the documentation, it is indeed so that ||, &&
and isTRUE always return a length-one logical vector. Hence the NA.

On a sidenote: there is no such thing as a scalar in R. What you call
scalar, is really a length-one vector. That seems like a detail, but is
important in understanding why this admittedly confusing behaviour actually
makes sense within the framework of R imho. I do understand your objections
and suggestions, but it would boil down to removing short circuited
operators from R.

My 2 cents.
Cheers
Joris

On Wed, Aug 29, 2018 at 5:03 AM Henrik Bengtsson 
wrote:


# Issue

'x || y' performs 'x[1] || y' for length(x) > 1.  For instance (here
using R 3.5.1),


c(TRUE, TRUE) || FALSE

[1] TRUE

c(TRUE, FALSE) || FALSE

[1] TRUE

c(TRUE, NA) || FALSE

[1] TRUE

c(FALSE, TRUE) || FALSE

[1] FALSE

This property is symmetric in LHS and RHS (i.e. 'y || x' behaves the
same) and it also applies to 'x && y'.

Note also how the above truncation of 'x' is completely silent -
there's neither an error nor a warning being produced.


# Discussion/Suggestion

Using 'x || y' and 'x && y' with a non-scalar 'x' or 'y' is likely a
mistake.  Either the code is written assuming 'x' and 'y' are scalars,
or there is a coding error and vectorized versions 'x | y' and 'x & y'
were intended.  Should 'x || y' always be considered an mistake if
'length(x) != 1' or 'length(y) != 1'?  If so, should it be a warning
or an error?  For instance,
'''r

x <- c(TRUE, TRUE)
y <- FALSE
x || y


Error in x || y : applying scalar operator || to non-scalar elements
Execution halted

What about the case where 'length(x) == 0' or 'length(y) == 0'?  Today
'x || y' returns 'NA' in such cases, e.g.


logical(0) || c(FALSE, NA)

[1] NA

logical(0) || logical(0)

[1] NA

logical(0) && logical(0)

[1] NA

I don't know the background for this behavior, but I'm sure there is
an argument behind that one.  Maybe it's simply that '||' and '&&'
should always return a scalar logical and neither TRUE nor FALSE can
be returned.

/Henrik

PS. This is in the same vein as
https://mailman.stat.ethz.ch/pipermail/r-devel/2017-March/073817.html
- in R (>=3.4.0) we now get that if (1:2 == 1) ... is an error if
_R_CHECK_LENGTH_1_CONDITION_=true

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Dénes Tóth

Hi,

I absolutely second Henrik's suggestion.

On 08/30/2018 01:09 PM, Emil Bode wrote:

I have to disagree, I think one of the advantages of '||' (or &&) is the lazy evaluation, 
i.e. you can use the first condition to "not care" about the second (and stop errors 
from being thrown).


I do not think Henrik's proposal implies that both arguments of `||` or 
`&&` should be evaluated before the evaluation of the condition. It 
implies that if an argument is evaluated, and its length does not equal 
one, it should return an error instead of the silent truncation of the 
argument.

So your argument is orthogonal to the issue.


So if I want to check if x is a length-one numeric with value a value between 0 and 1, I can do 
'class(x)=='numeric' && length(x)==1 && x>0 && x<1'.
In your proposal, having x=c(1,2) would throw an error or multiple warnings.
Also code that relies on the second argument not being evaluated would break, 
as we need to evaluate y in order to know length(y)
There may be some benefit in checking for length(x) only, though that could 
also cause some false positives (e.g. 'x==-1 || length(x)==0' would be a bit 
ugly, but not necessarily wrong, same for someone too lazy to write x[1] 
instead of x).

And I don’t really see the advantage. The casting to length one is (I think), a 
feature, not a bug. If I have/need a length one x, and a length one y, why not use 
'|' and '&'? I have to admit I only use them in if-statements, and if I need an 
error to be thrown when x and y are not length one, I can use the shorter versions 
and then the if throws a warning (or an error for a length-0 or NA result).

I get it that for someone just starting in R, the differences between | and || 
can be confusing, but I guess that's just the price to pay for having a 
vectorized language.


I use R for about 10 years, and use regularly `||` and `&&` for the 
standard purpose (implemented in most programming languages for the same 
purpose, that is, no evaluation of all arguments if it is not required 
to decide whether the condition is TRUE). I can not recall any single 
case when I wanted to use them for the purpose to evaluate whether the 
*first* elements of vectors fulfill the given condition.


However, I regularly write mistakenly `||` or `&&` when I actually want 
to write `|` or `&`, and have no chance to spot the error because of the 
silent truncation of the arguments.



Regards,
Denes





Best regards,
Emil Bode
  
Data-analyst
  
+31 6 43 83 89 33

emil.b...@dans.knaw.nl
  
DANS: Netherlands Institute for Permanent Access to Digital Research Resources

Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | i...@dans.knaw.nl 
 | dans.knaw.nl 

DANS is an institute of the Dutch Academy KNAW  and funding 
organisation NWO .

On 29/08/2018, 05:03, "R-devel on behalf of Henrik Bengtsson" 
 wrote:

 # Issue
 
 'x || y' performs 'x[1] || y' for length(x) > 1.  For instance (here

 using R 3.5.1),
 
 > c(TRUE, TRUE) || FALSE

 [1] TRUE
 > c(TRUE, FALSE) || FALSE
 [1] TRUE
 > c(TRUE, NA) || FALSE
 [1] TRUE
 > c(FALSE, TRUE) || FALSE
 [1] FALSE
 
 This property is symmetric in LHS and RHS (i.e. 'y || x' behaves the

 same) and it also applies to 'x && y'.
 
 Note also how the above truncation of 'x' is completely silent -

 there's neither an error nor a warning being produced.
 
 
 # Discussion/Suggestion
 
 Using 'x || y' and 'x && y' with a non-scalar 'x' or 'y' is likely a

 mistake.  Either the code is written assuming 'x' and 'y' are scalars,
 or there is a coding error and vectorized versions 'x | y' and 'x & y'
 were intended.  Should 'x || y' always be considered an mistake if
 'length(x) != 1' or 'length(y) != 1'?  If so, should it be a warning
 or an error?  For instance,
 '''r
 > x <- c(TRUE, TRUE)
 > y <- FALSE
 > x || y
 
 Error in x || y : applying scalar operator || to non-scalar elements

 Execution halted
 
 What about the case where 'length(x) == 0' or 'length(y) == 0'?  Today

 'x || y' returns 'NA' in such cases, e.g.
 
 > logical(0) || c(FALSE, NA)

 [1] NA
 > logical(0) || logical(0)
 [1] NA
 > logical(0) && logical(0)
 [1] NA
 
 I don't know the background for this behavior, but I'm sure there is

 an argument behind that one.  Maybe it's simply that '||' and '&&'
 should always return a scalar logical and neither TRUE nor FALSE can
 be returned.
 
 /Henrik
 
 PS. This is in the same vein as

 https://mailman.stat.ethz.ch/pipermail/r-devel/2017-March/073817.html
 - in R (>=3.4.0) we now get that if (1:2 == 1) ... is an error if
 _R_CHECK_LENGTH_1_CONDITION_=true
 
 __

 R-devel@r-project.org mailing list
 

Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-09-03 Thread Dénes Tóth

Hi Tomas,

On 09/03/2018 11:49 AM, Tomas Kalibera wrote:
Please don't do this to get the underlying vector length (or to achieve 
anything else). Setting/deleting attributes of an R object without 
checking the reference count violates R semantics, which in turn can 
have unpredictable results on R programs (essentially undebuggable 
segfaults now or more likely later when new optimizations or features 
are added to the language). Setting attributes on objects with reference 
count (currently NAMED value) greater than 0 (in some special cases 1 is 
ok) is cheating - please see Writing R Extensions - and getting speedups 
via cheating leads to fragile, unmaintainable and buggy code. 


Please note that data.table::setattr is an exported function of a widely 
used package (available from CRAN), which also has a description in 
?data.table::setattr why it might be useful.


Of course one has to use set* functions from data.table with extreme 
care, but if one does it in the right way, they can help a lot. For 
example there is no real danger of using them in internal functions 
where one can control what is get passed to the function or created 
within the function (so when one knows that the refcount==0 condition is 
true).


(Notwithstanding the above, but also supporting you argumentation, it 
took me hours to debug a particular problem in one of my internal 
packages, see https://github.com/Rdatatable/data.table/issues/1281)


In the present case, an important and unanswered question is (cited from 
Henrik):

>>> However, I'm concerned that calling unclass(x) may trigger an
>>> expensive copy internally in some cases.  Is that concern unfounded?

If no copy is made, length(unclass(x)) beats length(setattr(..)) in all 
scenarios.



Doing so 
in packages is particularly unhelpful to the whole community - packages 
should only use the public API as documented.


Similarly, getting a physical address of an object to hack around 
whether R has copied it or not should certainly not be done in packages 
and R code should never be working with or even obtaining physical 
address of an object. This is also why one cannot obtain such address 
using base R (apart in textual form from certain diagnostic messages 
where it can indeed be useful for low-level debugging).


Getting the physical address of the object was done exclusively for 
demonstration purposes. I totally agree that is should not be used for 
the purpose you described and I have never ever done so.


Regards,
Denes



Tomas

On 09/02/2018 01:19 AM, Dénes Tóth wrote:
The solution below introduces a dependency on data.table, but 
otherwise it does what you need:


---

# special method for Foo objects
length.Foo <- function(x) {
  length(unlist(x, recursive = TRUE, use.names = FALSE))
}

# an instance of a Foo object
x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")

# its length
stopifnot(length(x) == 3L)

# get its length as if it were a standard list
.length <- function(x) {
  cls <- class(x)
  # setattr() does not make a copy, but modifies by reference
  data.table::setattr(x, "class", NULL)
  # get the length
  len <- base::length(x)
  # re-set original classes
  data.table::setattr(x, "class", cls)
  # return the unclassed length
  len
}

# to check that we do not make unwanted changes
orig_class <- class(x)

# check that the address in RAM does not change
a1 <- data.table::address(x)

# 'unclassed' length
stopifnot(.length(x) == 2L)

# check that address is the same
stopifnot(a1 == data.table::address(x))

# check against original class
stopifnot(identical(orig_class, class(x)))

---


On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:

Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:

.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Thxs,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] class() |--> c("matrix", "arrary") [was "head.matrix ..."]

2019-11-15 Thread Dénes Tóth

Hi Abby,

On 11/15/19 10:19 PM, Abby Spurdle wrote:

And indeed I think you are right on spot and this would mean
that indeed the implicit class
"matrix" should rather become c("matrix", "array").


I've made up my mind (and not been contradicted by my fellow R
corers) to try go there for  R 4.0.0   next April.


I'm not enthusiastic about matrices extending arrays.
If a matrix is an array, then shouldn't all vectors in R, be arrays too?


The main distinguishing feature of matrices (and arrays) vs vectors is 
that they have a dimension attribute.


x <- as.list(letters[1:8]) # just to show that it generalizes not only 
to atomic vectors

is.vector(x) # TRUE
inherits(x, "matrix") # FALSE

dim(x) <- c(2, 4)
is.vector(x) # FALSE
inherits(x, "matrix") # TRUE
inherits(x, "array") # FALSE, but should be TRUE for consistency

dim(x) <- c(2, 2, 2)
is.vector(x) # FALSE
inherits(x, "matrix") # FALSE
inherits(x, "array") # TRUE


A matrix should be really nothing else just an array where 
length(dim(x)) == 2L.


IMHO the only special object which has dimension attribute but is not a 
special case of arrays is the data.frame.



Denes






#mockup
class (1)

[1] "numeric" "array"

Which is a bad idea.
It contradicts the central principle that R uses "Vectors" rather than "Arrays".
And I feel that matrices are and should be, a special case of vectors.
(With their inheritance from vectors taking precedence over anything else).

If the motivation is to solve the problem of 2D arrays, automatically
being mapped to matrices:


class (array (1, c (2, 2) ) )

[1] "matrix"

Then wouldn't it be better, to treat 2D arrays, as a special case, and
leave matrices as they are?


#mockup
class (array (1, c (2, 2) ) )

[1] "array2d" "matrix" "array"

Then 2D arrays would have access to both matrix and array methods...

Note, I don't want to enter into (another) discussion on the
differences between implicit class and classes defined via a class
attribute.
That's another discussion, which has little to do with my points above.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Patch idea: an environment variable for setting the user ID

2019-11-22 Thread Dénes Tóth
Maybe a further thing to consider is to introduce an environment 
variable by which one can avoid `add_build_stamp_to_description_file()` 
and any other calls altogether which affect bitwise reproducibility 
during the build process. If two users build the same package on exactly 
the same hardware and in the same software environment, the tarballs are 
expected to be identical. This is not the case now.


Denes

On 11/22/19 9:25 PM, Henrik Bengtsson wrote:

Another thing to consider if one wants to anonymize the build is the
UID/GID of the files in the tarball.  So there might be a need for a
R_BUILD_UID and R_BUILD_GID, e.g. by setting those to 32767
("nobody").

/Henrik

On Fri, Jan 25, 2019 at 9:25 AM Will L  wrote:


Thanks, Kurt.

I think I now have enough time to write a patch. What are the steps? I have
read https://www.r-project.org/bugs.html#how-to-submit-patches but I do not
seem to have permission to create a Bugzilla account at
https://bugs.r-project.org/bugzilla/.

Will


On Mon, Nov 12, 2018 at 2:46 AM Kurt Hornik  wrote:


Will L writes:



To R-devel,
In `R CMD build`, the ID of the user is automatically inserted into the
DESCRIPTION file, e.g.



Packaged: 2018-11-06 14:01:50 UTC; 




This is problematic for those of us who work in corporate settings. We

must

not divulge our user IDs in the packages we develop and release.



Jim Hester pointed out that these two lines in
`add_build_stamp_to_description_file()`
<

https://github.com/wch/r-source/blob/521c90a175d67475b9f1b43d7ae68bc48062d8e6/src/library/tools/R/build.R#L170-L171


are responsible. Could we consider his suggestion of using an optional
environment variable to overwrite the default behavior?



user <- Sys.getenv("R_BUILD_USERNAME")
if (!nzchar(user)) user <- Sys.info()["user"]
if(user == "unknown") user <- Sys.getenv("LOGNAME")


Yep, something along these lines should be possible.
R_BUILD_USER or R_BUILD_LOGNAME may seem more natural though ...

Best
-k




Will Landau
--
wlandau.github.io
linkedin.com/in/wlandau
github.com/wlandau



   [[alternative HTML version deleted]]



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
wlandau.github.io
linkedin.com/in/wlandau
github.com/wlandau

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] support of `substitute(...())`

2020-03-12 Thread Dénes Tóth

Dear R Core Team,

I learnt approx. two years ago in this mailing list that one can use the 
following "trick" to get a (dotted pair)list of the ellipsis arguments 
inside a function:


`substitute(...())`

Now my problem is that I can not find any occurrence of this call within 
the R source - the most frequent solution there is 
`substitute(list(...))[-1L] `


I would like to know if:
1) substitute(...()) is a trick or a feature in the language;
2) it will be supported in the future;
3) when (in which R version) it was introduced.

A hint on where to look for the machinery in the R source would be also 
appreciated.


Regards,
Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] support of `substitute(...())`

2020-03-12 Thread Dénes Tóth



On 3/12/20 4:06 PM, William Dunlap wrote:
Note that substitute(...()) and substitute(someFunc(...))[-1] give 
slightly different results, the former a pairlist and the latter a call.

   > str((function(...)substitute(...()))(stop(1),stop(2),stop(3)))
   Dotted pair list of 3
    $ : language stop(1)
    $ : language stop(2)
    $ : language stop(3)
   > 
str((function(...)substitute(someFunc(...))[-1])(stop(1),stop(2),stop(3)))

    language stop(1)(stop(2), stop(3))


Yes, I am aware of this difference. In my use cases, the ...() form 
gives the result that I prefer (a pairlist).




The ...() idiom has been around for a long time, but more recently 
(slightly after R-3.4.0?) the ...elt(n) and ...length() functions were 
introduced so you don't have to use it much.  


Yes, I know both.

I don't see a ...names() 
function that would give the names of the ... arguments - 
names(substitute(...())).


Exactly, this is a frequent use case. Occasionally I use it in other 
cases as well where I deliberately do not want to evaluate the arguments 
passed as dots.


What I am most interested in is whether this is a 'trick' or a legal use 
of a (rather unadvertised) feature of the language.




Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>


On Thu, Mar 12, 2020 at 2:09 AM Dénes Tóth <mailto:toth.de...@kogentum.hu>> wrote:


Dear R Core Team,

I learnt approx. two years ago in this mailing list that one can use
the
following "trick" to get a (dotted pair)list of the ellipsis arguments
inside a function:

`substitute(...())`

Now my problem is that I can not find any occurrence of this call
within
the R source - the most frequent solution there is
`substitute(list(...))[-1L] `

I would like to know if:
1) substitute(...()) is a trick or a feature in the language;
2) it will be supported in the future;
3) when (in which R version) it was introduced.

A hint on where to look for the machinery in the R source would be also
appreciated.

Regards,
Denes

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: tools::md5sum should accept connections and finally in-memory objects

2020-05-01 Thread Dénes Tóth




On 5/1/20 11:35 PM, Duncan Murdoch wrote:
The tools package is not for users, it's for functions that R uses in 
installing packages, checking them, etc. 


I think the target group for this functionality is the group of R 
developers, not regular R users.


If you want a function for 
users, it would belong in utils.  But what's wrong with the digest 
package?  What's the argument that R Core should take this on?


There is nothing wrong with the digest package except for being an extra 
dependency which could be avoided if an already implemented C function 
were available at the R level.


I do understand that given the load on R Core, they do include new 
features and the related burden of maintenance only if it is absolutely 
necessary. This is why I asked first whether there is a particular 
reason not to expose an already existing (base-R) implementation. I 
think it is reasonable to assume that 'md5_buffer' exists for a reason - 
but probably there is a reason why it never became part of any exported 
function. Now I checked the history of the md5.c file; it was last 
edited 8 years ago. Somewhat surprisingly, md5_buffer was already 
included in the original file (created 17 years ago), but marked as 
UNUSED 12 years ago.


Just to clarify: I do not want suggest that R Core team should take over 
all functionalities of the digest package. I do really focus on 
computing MD5 digests, which is already possible for files. My 
suggestion for a more general function was meant for keeping potential 
further enhancements in mind.





Duncan Murdoch

On 01/05/2020 5:00 p.m., Dénes Tóth wrote:


AFAIK there is no hashing utility in base R which can create hash
digests of arbitrary R objects. However, as also described by Henrik
Bengtsson in [1], we have tools::md5sum() which calculates MD5 hashes of
files. Calculating hashes of in-memory objects is a very common task in
several areas, as demonstrated by the popularity of the 'digest' package
(~850.000 downloads/month).

Upon the inspection of the relevant files in the R-source (e.g., [2] and
[3]), it seems all building blocks have already been implemented so that
hashing should not be restricted to files. I would like to ask:

1) Why is md5_buffer unused?:
In src/library/tools/src/md5.c [see 2], md5_buffer is implemented which
seems to be the counterpart of md5_stream for non-file inputs:

---
#ifdef UNUSED
/* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
 result is always in little endian byte order, so that a byte-wise
 output yields to the wanted ASCII representation of the message
 digest.  */
static void *
md5_buffer (const char *buffer, size_t len, void *resblock)
{
    struct md5_ctx ctx;

    /* Initialize the computation context.  */
    md5_init_ctx ();

    /* Process whole buffer but last len % 64 bytes.  */
    md5_process_bytes (buffer, len, );

    /* Put result in desired memory area.  */
    return md5_finish_ctx (, resblock);
}
#endif
---

2) How can the R-community help so that this feature becomes available
in package 'tools'?

Suggestions:
As a first step, it would be great if tools::md5sum would support
connections (credit goes to Henrik for the idea). E.g., instead of the
signature tools::md5sum(files), we could have tools::md5sum(files, conn
= NULL), which would allow:

x <- runif(10)
tools::md5sum(conn = rawConnection(serialize(x, NULL)))

To avoid the inconsistency between 'files' (which computes the hash
digests in a vectorized manner, that is, one for each file) and 'conn'
(which expects a single connection), and to make it easier to extend the
hashing for other algorithms without changing the main R interface, a
more involved solution would be to introduce tools::hash and
tools::hashes, in a similar vein to digest::digest and 
digest::getVDigest.


Regards,
Denes


[1]: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21
[2]:
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/md5.c#L172 


[3]:
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/Rmd5.c#L27 



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: tools::md5sum should accept connections and finally in-memory objects

2020-05-01 Thread Dénes Tóth




On 5/1/20 11:09 PM, John Mount wrote:

Perhaps use the digest package? Isn't "R the R packages?"


I think it is clear that I am aware of the existence of the digest 
package and also of other packages with similar functionality, e.g. the 
fastdigest package. (And I actually do use digest as I guess 99% percent 
of the R developers do at least as an indirect dependency.) The point is 
that
a) digest is a wonderful and very stable package, but still, it is a 
user-contributed package, whereas
b) 'tools' is a base package which is included by default in all R 
installations, and
c) tools::md5sum already exists, with almost all building blocks to 
allow its extension to calculate MD5 hashes of R objects, and
d) there is high demand in the R community for being able to calculate 
hashes.


So yes, if one wants to use all the utilities or the various algos that 
the digest package provides, one should install and load it. But if one 
can live with MD5 hashes, why not use the built-in R function? (Well, 
without serializing an object to a file, calling tools::md5sum, and then 
cleaning up the file.)




On May 1, 2020, at 2:00 PM, Dénes Tóth <mailto:toth.de...@kogentum.hu>> wrote:



AFAIK there is no hashing utility in base R which can create hash 
digests of arbitrary R objects. However, as also described by Henrik 
Bengtsson in [1], we have tools::md5sum() which calculates MD5 hashes 
of files. Calculating hashes of in-memory objects is a very common 
task in several areas, as demonstrated by the popularity of the 
'digest' package (~850.000 downloads/month).


Upon the inspection of the relevant files in the R-source (e.g., [2] 
and [3]), it seems all building blocks have already been implemented 
so that hashing should not be restricted to files. I would like to ask:


1) Why is md5_buffer unused?:
In src/library/tools/src/md5.c [see 2], md5_buffer is implemented 
which seems to be the counterpart of md5_stream for non-file inputs:


---
#ifdef UNUSED
/* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
  result is always in little endian byte order, so that a byte-wise
  output yields to the wanted ASCII representation of the message
  digest.  */
static void *
md5_buffer (const char *buffer, size_t len, void *resblock)
{
 struct md5_ctx ctx;

 /* Initialize the computation context.  */
 md5_init_ctx ();

 /* Process whole buffer but last len % 64 bytes.  */
 md5_process_bytes (buffer, len, );

 /* Put result in desired memory area.  */
 return md5_finish_ctx (, resblock);
}
#endif
---

2) How can the R-community help so that this feature becomes available 
in package 'tools'?


Suggestions:
As a first step, it would be great if tools::md5sum would support 
connections (credit goes to Henrik for the idea). E.g., instead of the 
signature tools::md5sum(files), we could have tools::md5sum(files, 
conn = NULL), which would allow:


x <- runif(10)
tools::md5sum(conn = rawConnection(serialize(x, NULL)))

To avoid the inconsistency between 'files' (which computes the hash 
digests in a vectorized manner, that is, one for each file) and 'conn' 
(which expects a single connection), and to make it easier to extend 
the hashing for other algorithms without changing the main R 
interface, a more involved solution would be to introduce tools::hash 
and tools::hashes, in a similar vein to digest::digest and 
digest::getVDigest.


Regards,
Denes


[1]: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21
[2]: 
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/md5.c#L172
[3]: 
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/Rmd5.c#L27


__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


---
John Mount
http://www.win-vector.com/
Our book: Practical Data Science with R
http://practicaldatascience.com







__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Request: tools::md5sum should accept connections and finally in-memory objects

2020-05-01 Thread Dénes Tóth



AFAIK there is no hashing utility in base R which can create hash 
digests of arbitrary R objects. However, as also described by Henrik 
Bengtsson in [1], we have tools::md5sum() which calculates MD5 hashes of 
files. Calculating hashes of in-memory objects is a very common task in 
several areas, as demonstrated by the popularity of the 'digest' package 
(~850.000 downloads/month).


Upon the inspection of the relevant files in the R-source (e.g., [2] and 
[3]), it seems all building blocks have already been implemented so that 
hashing should not be restricted to files. I would like to ask:


1) Why is md5_buffer unused?:
In src/library/tools/src/md5.c [see 2], md5_buffer is implemented which 
seems to be the counterpart of md5_stream for non-file inputs:


---
#ifdef UNUSED
/* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
   result is always in little endian byte order, so that a byte-wise
   output yields to the wanted ASCII representation of the message
   digest.  */
static void *
md5_buffer (const char *buffer, size_t len, void *resblock)
{
  struct md5_ctx ctx;

  /* Initialize the computation context.  */
  md5_init_ctx ();

  /* Process whole buffer but last len % 64 bytes.  */
  md5_process_bytes (buffer, len, );

  /* Put result in desired memory area.  */
  return md5_finish_ctx (, resblock);
}
#endif
---

2) How can the R-community help so that this feature becomes available 
in package 'tools'?


Suggestions:
As a first step, it would be great if tools::md5sum would support 
connections (credit goes to Henrik for the idea). E.g., instead of the 
signature tools::md5sum(files), we could have tools::md5sum(files, conn 
= NULL), which would allow:


x <- runif(10)
tools::md5sum(conn = rawConnection(serialize(x, NULL)))

To avoid the inconsistency between 'files' (which computes the hash 
digests in a vectorized manner, that is, one for each file) and 'conn' 
(which expects a single connection), and to make it easier to extend the 
hashing for other algorithms without changing the main R interface, a 
more involved solution would be to introduce tools::hash and 
tools::hashes, in a similar vein to digest::digest and digest::getVDigest.


Regards,
Denes


[1]: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21
[2]: 
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/md5.c#L172
[3]: 
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/Rmd5.c#L27


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is there any way to check the class of an ALTREP?

2020-10-19 Thread Dénes Tóth

Benjamin,

You happened to send a link which points to the OP's own package :) I 
think Jiefei would like to know how one can "officially" determine if an 
arbitrary ALTERP object belongs to a class that he owns.


Regards,
Denes


On 10/19/20 10:22 AM, Benjamin Christoffersen wrote:

It seems as if you can you use the ALTREP macro as done in this
package: 
https://github.com/Jiefei-Wang/SharedObject/blob/804b6ac58c63a4bae95343ab43e8b1547b07ee6b/src/C_interface.cpp#L185

and in base R: 
https://github.com/wch/r-source/blob/54fbdca9d3fc63437d9e697f442d32732fb4f443/src/include/Rinlinedfuns.h#L118

The macro is defined here in Rinternals.h:
https://github.com/wch/r-source/blob/abb550c99b3927e5fc03d12f1a8e7593fddc04d2/src/include/Rinternals.h#L325

Den man. 19. okt. 2020 kl. 10.13 skrev Jiefei Wang :


Hi all,

I would like to determine if an ALTREP object is from my package, I see
there is a function `ALTREP_CLASS` defined in RInternal.h but its return
value is neither a `R_altrep_class_t` object nor an STRSXP representing a
class name. I do not know how to correctly use it. Any suggestions?

Thanks,
Jiefei

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread Dénes Tóth

Dear Luke,

In the meantime I checked the R-syntax branch and the docs; they are 
very helpful. I would also like to thank you for putting effort into 
this feature. Keeping it at the syntax level is also a very smart 
decision. However, the current API might not exploit the full power of 
the basic idea.


1) Requiring either an anonymous function or a function call, but not 
allowing for symbols which point to functions is inconsistent and will 
be misleading for non-experts.


foo <- function(x) x
identical(foo, function(x) x)

mtcars |> foo   #bang!
mtcars |> function(x) x #fine?

You stated in :
"
Another variation supported by the implementation is that a symbol on
the RHS is interpreted as the name of a function to call with the LHS
as argument:

```r
> quote(x |> f)
f(x)
```
"

So clearly this is not an implementation issue but a design decision.

As a remedy, two different pipe operators could be introduced:

LHS |> RHS-> RHS is treated as a function call
LHS |>> RHS   -> RHS is treated as a function

If |>> is used, it would not matter which notation is used for the RHS 
expression; the parser would assume it evaluates to a function.


2) Simplified lambda expression:
IMHO in the vast majority of use cases, this is used for single-argument 
functions, so parenthesis would not be required. Hence, both forms would 
be valid and equivalent:


\x x + 1
\(x) x + 1


3) Function composition:
Allowing for concise composition of functions would be a great feature. 
E.g., instead of


foo <- function(x) print(mean(sqrt(x), na.rm = TRUE), digits = 2)

or

foo <- \x {x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)}

one could write

foo <- \x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)

So basically if the lambda argument is followed by a pipe operator, the 
pipe chain is transformed to a function body where the first lambda 
argument is inserted into the first position of the pipeline.



Best,
Denes


On 12/5/20 7:10 PM, luke-tier...@uiowa.edu wrote:

We went back and forth on this several times. The key advantage of
requiring parentheses is to keep things simple and consistent.  Let's
get some experience with that. If experience shows requiring
parentheses creates too many issues then we can add the option of
dropping them later (with special handling of :: and :::). It's easier
to add flexibility and complexity than to restrict it after the fact.

Best,

luke

On Sat, 5 Dec 2020, Hugh Parsonage wrote:


I'm surprised by the aversion to

mtcars |> nrow

over

mtcars |> nrow()

and I think the decision to disallow the former should be
reconsidered.  The pipe operator is only going to be used when the rhs
is a function, so there is no ambiguity with omitting the parentheses.
If it's disallowed, it becomes inconsistent with other treatments like
sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
noise.  I'm not sure why this decision was taken

If the only issue is with the double (and triple) colon operator, then
ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
-- in other words, demote the precedence of |>

Obviously (looking at the R-Syntax branch) this decision was
considered, put into place, then dropped, but I can't see why
precisely.

Best,


Hugh.







On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar 
 wrote:


On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch 
 wrote:


On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:

  Error: function '::' not supported in RHS call of a pipe


To me, this error looks much more friendly than magrittr's error.
Some of them got too used to specify functions without (). This
is OK until they use `::`, but when they need to use it, it takes
hours to figure out why

mtcars %>% base::head
#> Error in .::base : unused argument (head)

won't work but

mtcars %>% head

works. I think this is a too harsh lesson for ordinary R users to
learn `::` is a function. I've been wanting for magrittr to drop the
support for a function name without () to avoid this confusion,
so I would very much welcome the new pipe operator's behavior.
Thank you all the developers who implemented this!


I agree, it's an improvement on the corresponding magrittr error.

I think the semantics of not evaluating the RHS, but treating the pipe
as purely syntactical is a good decision.

I'm not sure I like the recommended way to pipe into a particular 
argument:


   mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)

or

   mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d)

both of which are equivalent to

   mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = 
d))()


It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)


Which is really not that far off from

mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)

once you get used to it.

One consequence of the implementation is that it's not clear how
multiple occurrences of the 

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread Dénes Tóth




On 12/6/20 4:32 PM, Duncan Murdoch wrote:
> On 06/12/2020 9:43 a.m., Dénes Tóth wrote:
>> Dear Luke,
>>
>> In the meantime I checked the R-syntax branch and the docs; they are
>> very helpful. I would also like to thank you for putting effort into
>> this feature. Keeping it at the syntax level is also a very smart
>> decision. However, the current API might not exploit the full power of
>> the basic idea.
>>
>> 1) Requiring either an anonymous function or a function call, but not
>> allowing for symbols which point to functions is inconsistent and will
>> be misleading for non-experts.
>>
>> foo <- function(x) x
>> identical(foo, function(x) x)
>>
>> mtcars |> foo   #bang!
>> mtcars |> function(x) x #fine?
>
> You are missing the point.  The value of the RHS is irrelevant to the
> transformation.  All that matters is its form.  So "foo" and
> "function(x) x" are completely different things, even if identical()
> thinks their value is the same.

We are at the syntax level, so of course we do not know the value of the 
RHS when the parsing occurs. I *do* understand that the *form* is 
important here, but how do you explain this to a rookie R user? He will 
see that he entered two expressions which he thinks are identical, even 
though they are not identical at the level when the parsing occurs.


Also think of the potential users of this syntax. There are at least two 
groups:
1) ~95% of the users: active users of `%>%`. My experience is that the 
vast majority of them do not use the "advanced" features of magrittr; 
however, they are got used to things like mtcars |> print. Provide them 
with the RHS-as-symbol syntax and they will be happy - they have a 
plug-and-forget replacement. Or do enforce a function call - they will 
be unhappy, and will not adopt the new syntax.
2) ~5% of the users (including me): have not used magrittr or any other 
(probably better) implementations (e.g., pipeR, wrapr) of the pipe 
operator because it could lead to nasty performance issues, bugs, and 
debugging problems. However, from the functional-programming-style of 
view, these users might prefer the new syntax and as few typing as 
possible.


>
> It's also true that "foo()" and "function(x) x" are completely
> different, but they are well-defined forms:  one is a call, the other is
> an anonymous function definition.
>
> Accepting a plain "foo" would add a third form (a name), which might
> make sense, but hardly gains anything:

I would reverse the argumentation: Luke has a working implementation for 
the case if the RHS is a single symbol. What do we loose if we keep it?


Best,
Denes

> whereas dropping the anonymous
> function definition costs quite a bit.  Without special-casing anonymous
> function definitions you'd need to enter
>
> mtcars |> (function(x) x)()
>
> or
>
> mtcars |> (\(x) x)()
>
> which are both quite difficult to read.
>
> Duncan Murdoch
>
>>
>> You stated in :
>> "
>> Another variation supported by the implementation is that a symbol on
>> the RHS is interpreted as the name of a function to call with the LHS
>> as argument:
>>
>> ```r
>>   > quote(x |> f)
>> f(x)
>> ```
>> "
>>
>> So clearly this is not an implementation issue but a design decision.
>>
>> As a remedy, two different pipe operators could be introduced:
>>
>> LHS |> RHS-> RHS is treated as a function call
>> LHS |>> RHS   -> RHS is treated as a function
>>
>> If |>> is used, it would not matter which notation is used for the RHS
>> expression; the parser would assume it evaluates to a function.
>>
>> 2) Simplified lambda expression:
>> IMHO in the vast majority of use cases, this is used for single-argument
>> functions, so parenthesis would not be required. Hence, both forms would
>> be valid and equivalent:
>>
>> \x x + 1
>> \(x) x + 1
>>
>>
>> 3) Function composition:
>> Allowing for concise composition of functions would be a great feature.
>> E.g., instead of
>>
>> foo <- function(x) print(mean(sqrt(x), na.rm = TRUE), digits = 2)
>>
>> or
>>
>> foo <- \x {x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)}
>>
>> one could write
>>
>> foo <- \x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)
>>
>> So basically if the lambda argument is followed by a pipe operator, the
>> pipe chain is transformed to a function body where the first lambda
>> argument is inserted into the first posit

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread Dénes Tóth

Hi Gabriel,

Thanks for the comments. See inline.

On 12/6/20 8:16 PM, Gabriel Becker wrote:

Hi Denes,

On Sun, Dec 6, 2020 at 6:43 AM Dénes Tóth <mailto:toth.de...@kogentum.hu>> wrote:


Dear Luke,

In the meantime I checked the R-syntax branch and the docs; they are
very helpful. I would also like to thank you for putting effort into
this feature. Keeping it at the syntax level is also a very smart
decision. However, the current API might not exploit the full power of
the basic idea.

1) Requiring either an anonymous function or a function call, but not
allowing for symbols which point to functions is inconsistent and will
be misleading for non-experts.

foo <- function(x) x
identical(foo, function(x) x)

mtcars |> foo               #bang!
mtcars |> function(x) x     #fine?

You stated in :
"
Another variation supported by the implementation is that a symbol on
the RHS is interpreted as the name of a function to call with the LHS
as argument:

```r
  > quote(x |> f)
f(x)
```
"

So clearly this is not an implementation issue but a design decision.

As a remedy, two different pipe operators could be introduced:

LHS |> RHS    -> RHS is treated as a function call
LHS |>> RHS   -> RHS is treated as a function

If |>> is used, it would not matter which notation is used for the RHS
expression; the parser would assume it evaluates to a function.


I think multiplying the operators would not be a net positive. You'd 
then have to remember and mix them when you mix anonymous functions and 
non-anonymous functions.  It would result in


LHS |> RHS1() |>> \(x,y) blablabla |> RHS3()

I think thats too much intricacy. Better to be a little more 
restrictive  in way that (honestly doesnt' really hurt anything afaics, 
and) guarantees consistency.




That was just a secondary option for the case if pure symbols are 
disallowed on the RHS. The point is that one can not avoid inconsistency 
here because of practical considerations; let us admit, R has tons of 
inconsistencies which are usually motivated by making interactive data 
analysis more convenient. To me it seems more inconsistent to allow for 
function calls and functions but not symbols - either allow all of them 
or be strict and enforce function calls.




2) Simplified lambda expression:
IMHO in the vast majority of use cases, this is used for
single-argument
functions, so parenthesis would not be required. Hence, both forms
would
be valid and equivalent:

\x x + 1
\(x) x + 1


Why special case something here when soemtimes you'll want more than one 
argument. The parentheses seem really not a big deal. So I don't 
understand the motivation here, if I'm being honest.


Just as I told before: because of practical considerations. In a 
Hungarian keyboard layout, this is how one types the backslash 
character: RightAlt+Q. Parenthesis: Shift+8 (left), Shift+9 (Right). 
This is how you type 'function' in the R terminal: fu+TAB. I do not 
really see the point of the new notation as it is now.





3) Function composition:
Allowing for concise composition of functions would be a great feature.
E.g., instead of

foo <- function(x) print(mean(sqrt(x), na.rm = TRUE), digits = 2)

or

foo <- \x {x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)}

one could write

foo <- \x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)

So basically if the lambda argument is followed by a pipe operator, the
pipe chain is transformed to a function body where the first lambda
argument is inserted into the first position of the pipeline.


This one I disagree with very strongly. Reading pipelines would suddenly 
require a /much/ higher cognitive load than before because you have to 
model that complexity just to read it and know what it says. The 
brackets there seem like an extremely low price to pay to avoid that. 
Operator precedence should be extremely and easily predictable.




Unfortunately I could not come up with a better solution to approximate 
a function composition operator (supporting tacit/pointfree-style 
programming) which avoids the introduction of a separate function (like 
e.g. purrr::compose).


In Haskell:
floor . sqrt

In Julia (looks nice but requires \circTAB or custom keybinding):
floor ∘ sqrt

In R: ?


Best,
Denes






Best,
Denes


On 12/5/20 7:10 PM, luke-tier...@uiowa.edu
<mailto:luke-tier...@uiowa.edu> wrote:
 > We went back and forth on this several times. The key advantage of
 > requiring parentheses is to keep things simple and consistent.  Let's
 > get some experience with that. If experience shows requiring
 > parentheses creates too many issues then we can add the option of
 > dropping them lat

Re: [Rd] New pipe operator

2020-12-07 Thread Dénes Tóth




On 12/7/20 11:09 PM, Gabriel Becker wrote:

On Mon, Dec 7, 2020 at 11:05 AM Kevin Ushey  wrote:


IMHO the use of anonymous functions is a very clean solution to the
placeholder problem, and the shorthand lambda syntax makes it much
more ergonomic to use. Pipe implementations that crawl the RHS for
usages of `.` are going to be more expensive than the alternatives. It
is nice that the `|>` operator is effectively the same as a regular R
function call, and given the identical semantics could then also be
reasoned about the same way regular R function calls are.



I agree. That said, one thing that maybe could be done, though I'm not
super convinced its needed, is make a "curry-stuffed pipe", where something
like

LHS |^pipearg^> RHS(arg1 = 5, arg3 = 7)

Would parse to

RHS(pipearg = LHS, arg1 = 5, arg3 = 7)



This gave me the idea that naming the arguments can be used to skip the 
placeholder issue:


"funny" |> sub(pattern = "f", replacement = "b")

Of course this breaks if the maintainer changes the order of the 
function arguments (which is not a nice practice but happens).


An option could be to allow for missing argument in the first position, 
but this might add further undesired complexity, so probably not worth 
the effort:


"funny" |> sub(x =, "f", "b")

So basically the parsing rule would be:

LHS |> RHS(arg=, ...) -> RHS(arg=LHS, ...)




(Assuming we could get the parser to handle |^bla^> correctly)

For argument position issues would be sufficient. For more complicated
expressions, e.g., those that would use the placeholder multiple times or
inside compound expressions, requiring anonymous functions seems quite
reasonable to me. And honestly, while I kind of like it, I'm not sure if
that "stuffed pipe" expression (assuming we could get the parser to capture
it correctly) reads to me as nicer than the following, anyway.

LHS |> \(x) RHS(arg1 = 5, pipearg = x, arg3 = 7)

~G



I also agree usages of the `.` placeholder can make the code more
challenging to read, since understanding the behavior of a piped
expression then requires scouring the RHS for usages of `.`, which can
be challenging in dense code. Piping to an anonymous function makes
the intent clear to the reader: the programmer is likely piping to an
anonymous function because they care where the argument is used in the
call, and so the reader of code should be aware of that.

Best,
Kevin



On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
 wrote:


On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch 

wrote:

An advantage of the current implementation is that it's simple and easy
to understand.  Once you make it a user-modifiable binary operator,
things will go kind of nuts.

For example, I doubt if there are many users of magrittr's pipe who
really understand its subtleties, e.g. the example in Luke's paper

where

1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
continue the fun.)


The rule is not so complicated.  Automatic insertion is done unless
you use dot in the top level function or if you surround it with
{...}.  It really makes sense since if you use gsub(pattern,
replacement, .) then surely you don't want automatic insertion and if
you surround it with { ... } then you are explicitly telling it not
to.

Assuming the existence of placeholders a possible simplification would
be to NOT do automatic insertion if { ... } is used and to use it
otherwise although personally having used it for some time I find the
existing rule in magrittr generally does what you want.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-04 Thread Dénes Tóth



On 12/4/20 3:05 PM, Duncan Murdoch wrote:

...

It's tempting to suggest it should allow something like

   mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)

which would be expanded to something equivalent to the other versions: 
but that makes it quite a bit more complicated.  (Maybe _ or \. should 
be used instead of ., since those are not legal variable names.)


I support the idea of using an underscore (_) as the placeholder symbol. 
 Syntactic sugars work the the best if 1) they require less keystrokes 
and/or 2) are easier to read compared to the "normal" syntax, and 3) can 
not lead to unexpected bugs (which is a major problem with the magrittr 
pipe). Using '_' fulfills all of these criteria since '_' can not clash 
with any variable in the environment.


Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] return (x+1) * 1000

2020-11-20 Thread Dénes Tóth

Or even more illustratively:

uneval_after_return <- function(x) {
  return(x) * stop("Not evaluated")
}
uneval_after_return(1)
# [1] 1

On 11/20/20 10:12 PM, Mateo Obregón wrote:

Dear r-developers-

After many years of using and coding in R and other languages, I came across
something that I think should be flagged by the parser:

bug <- function (x) {
  return (x + 1) * 1000
}

bug(1)

[1] 2

The return() call is not like any other function call that returns a value to
the point where it was called from. I think this should straightforwardly be
handled in the parser by flagging it as a syntactic error.

Thoughts?

Mateo.
--
Mateo Obregón.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] return (x+1) * 1000

2020-11-20 Thread Dénes Tóth
Yes, the behaviour of return() is absolutely consistent. I am wondering 
though how many experienced R developers would predict the correct 
return value just by looking at those code snippets.


On 11/21/20 12:33 AM, Gabriel Becker wrote:

And the related:

> f = function() stop(return("lol"))

> f()

[1] "lol"


I have a feeling all of this is just return() performing correctly 
though. If there are already R CMD CHECK checks for this kind of thing 
(I wasnt sure but I'm hearing from others there may be/are) that may be 
(and/or may need to be) sufficient.


~G

On Fri, Nov 20, 2020 at 3:27 PM Dénes Tóth <mailto:toth.de...@kogentum.hu>> wrote:


Or even more illustratively:

uneval_after_return <- function(x) {
    return(x) * stop("Not evaluated")
}
uneval_after_return(1)
# [1] 1

On 11/20/20 10:12 PM, Mateo Obregón wrote:
 > Dear r-developers-
 >
 > After many years of using and coding in R and other languages, I
came across
 > something that I think should be flagged by the parser:
 >
 > bug <- function (x) {
 >       return (x + 1) * 1000
 > }
 >> bug(1)
 > [1] 2
 >
 > The return() call is not like any other function call that
returns a value to
 > the point where it was called from. I think this should
straightforwardly be
 > handled in the parser by flagging it as a syntactic error.
 >
 > Thoughts?
 >
 > Mateo.
 > --
 > Mateo Obregón.
 >
 > __
 > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel
 >

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R check false positive - multiple versions of a dependency

2021-04-24 Thread Dénes Tóth




Disclaimer: I sent this report first to r-package-de...@r-project.org 
but it seems it has not been delivered to the list - re-trying to r-devel



Dear R maintainers,

Use case:
Restrict the acceptable versions of an imported package (e.g., 'pkg') to 
a closed interval. That is, provide *both* pkg (>= min.version.nr), pkg 
(<= max.version.nr) under Imports.


Problem:
Even though the package is an internal package, I want to have clean R 
CMD check results for QC reasons, and this seems impossible due to a bug 
in tools/R/QC.R/.check_package_description2.


Details:
This is a quote from Writing R Extensions, 1.1.3 Package Dependencies:

"A package or ‘R’ can appear more than once in the ‘Depends’ field, for 
example to give upper and lower bounds on acceptable versions."


In reality, this statement seems untrue: 1) only R can appear more than 
once (even base packages like 'stats' trigger a NOTE in R CMD check); 2) 
Not only 'Depends', but other fields (Imports, Suggests, Enhances) can 
contain duplicated entries in the sense that the entries are processed 
as expected, but the check gives a NOTE.


Minimal reproducible example:
In a (Linux) terminal, issue the following commands (note the Depends row):

#
mkdir -p pkgname
echo "
Depends: R (>= 3.1.0), R (<= 4.1.0)
Package: pkgname
Version: 0.5-1
Date: 2021-04-15
Title: My First Collection of Functions
Author: Joe Developer [aut, cre],
  Pat Developer [aut],
  A. User [ctb]
Maintainer: Joe Developer 
Description: A (one paragraph) description of what
  the package does and why it may be useful.
License: GPL (>= 2)
" > pkgname/DESCRIPTION

R CMD build pkgname
_R_CHECK_CRAN_INCOMING_REMOTE_=FALSE R CMD check pkgname_0.5-1.tar.gz 
--as-cran --no-manual

#

The commands above return with "Status: OK" - so far so good.

Now instead of restricting the R version, let us restrict the version of 
'stats'. (This is the only change, see Depends.)


#
echo "
Depends: stats (>= 0.0.0), stats (<= 10.0.0)
Package: pkgname
Version: 0.5-1
Date: 2021-04-15
Title: My First Collection of Functions
Author: Joe Developer [aut, cre],
  Pat Developer [aut],
  A. User [ctb]
Maintainer: Joe Developer 
Suggests: MASS
Description: A (one paragraph) description of what
  the package does and why it may be useful.
License: GPL (>= 2)
" > pkgname/DESCRIPTION
R CMD build pkgname
_R_CHECK_CRAN_INCOMING_REMOTE_=FALSE R CMD check pkgname_0.5-1.tar.gz 
--as-cran --no-manual

#

Now the status is "Status: 1 NOTE", and the note is:
"Package listed in more than one of Depends, Imports, Suggests, Enhances:
  ‘stats’
A package should be listed in only one of these fields."

Possible fix:
1) I think the highlighted sentence in Writing R Extensions should read as:
"A package or ‘R’ can appear more than once in the ‘Depends’ field, for 
example to give upper and lower bounds on acceptable versions. For 
packages, the same rule applies for ‘Imports’ and ‘Suggests’ fields (see 
later)."


2) In .check_package_description2(), 
'unique(allpkgs[duplicated(allpkgs)])' shall be replaced with a more 
elaborated check. BTW, that check appears twice in the function, where 
the first result is assigned to 'out' and is never used until 'out' gets 
re-assigned. See 
https://github.com/r-devel/r-svn/blob/0d65935f30dcaccfeee1dd61991bf4b1444873bc/src/library/tools/R/QC.R#L3553


If you agree this is a bug, I can create a formal bug report and 
probably create a patch, too.


Regards,
Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R check false positive - multiple versions of a dependency

2021-04-25 Thread Dénes Tóth





On 4/24/21 5:52 PM, Duncan Murdoch wrote:
I'd say a NOTE is appropriate even if upper and lower limits are 
allowed, but the wording of the current note should be changed, e.g. 
your example should say


"Package listed more than once in Depends, Imports, Suggests, Enhances:
  ‘stats’"

If you really meant to do this, you can ignore the note, but I'd suspect 
multiple listings are more often an error than intentional, and that's 
what NOTEs are for.


I would say if a package is listed multiple times, but with different 
*explicit* version requirements and under the same heading (one and only 
one of Depends, Imports, Suggests, Enhances), it is valid and almost 
surely intentional. Currently the code which performs the check (and 
that I linked to) is not smart enough to distinguish between this 
particular use case and simple multiple listings of the same package 
dependency (which I agree can be assumed to be an error and not 
intentional).




There may still be a more serious bug here if one of the limits is 
ignored; I haven't checked that.


I checked it, and can confirm that *both* limits are considered. This 
supports my argument that this is a valid use case, and the NOTE could 
be avoided by a smarter check in the relevant part of 
.check_package_description2. I also understand this is a low-priority 
issue, so I do not expect someone from R-Core wants to spend time on 
fixing it. This is why I suggested I could give it a try to do it on my 
own if there is any chance that my patch will be accepted.


Regards,
Denes



Duncan Murdoch

On 21/04/2021 6:57 a.m., Dénes Tóth wrote:



Disclaimer: I sent this report first to r-package-de...@r-project.org
but it seems it has not been delivered to the list - re-trying to r-devel


Dear R maintainers,

Use case:
Restrict the acceptable versions of an imported package (e.g., 'pkg') to
a closed interval. That is, provide *both* pkg (>= min.version.nr), pkg
(<= max.version.nr) under Imports.

Problem:
Even though the package is an internal package, I want to have clean R
CMD check results for QC reasons, and this seems impossible due to a bug
in tools/R/QC.R/.check_package_description2.

Details:
This is a quote from Writing R Extensions, 1.1.3 Package Dependencies:

"A package or ‘R’ can appear more than once in the ‘Depends’ field, for
example to give upper and lower bounds on acceptable versions."

In reality, this statement seems untrue: 1) only R can appear more than
once (even base packages like 'stats' trigger a NOTE in R CMD check); 2)
Not only 'Depends', but other fields (Imports, Suggests, Enhances) can
contain duplicated entries in the sense that the entries are processed
as expected, but the check gives a NOTE.

Minimal reproducible example:
In a (Linux) terminal, issue the following commands (note the Depends 
row):


#
mkdir -p pkgname
echo "
Depends: R (>= 3.1.0), R (<= 4.1.0)
Package: pkgname
Version: 0.5-1
Date: 2021-04-15
Title: My First Collection of Functions
Author: Joe Developer [aut, cre],
    Pat Developer [aut],
    A. User [ctb]
Maintainer: Joe Developer 
Description: A (one paragraph) description of what
    the package does and why it may be useful.
License: GPL (>= 2)
" > pkgname/DESCRIPTION

R CMD build pkgname
_R_CHECK_CRAN_INCOMING_REMOTE_=FALSE R CMD check pkgname_0.5-1.tar.gz
--as-cran --no-manual
#

The commands above return with "Status: OK" - so far so good.

Now instead of restricting the R version, let us restrict the version of
'stats'. (This is the only change, see Depends.)

#
echo "
Depends: stats (>= 0.0.0), stats (<= 10.0.0)
Package: pkgname
Version: 0.5-1
Date: 2021-04-15
Title: My First Collection of Functions
Author: Joe Developer [aut, cre],
    Pat Developer [aut],
    A. User [ctb]
Maintainer: Joe Developer 
Suggests: MASS
Description: A (one paragraph) description of what
    the package does and why it may be useful.
License: GPL (>= 2)
" > pkgname/DESCRIPTION
R CMD build pkgname
_R_CHECK_CRAN_INCOMING_REMOTE_=FALSE R CMD check pkgname_0.5-1.tar.gz
--as-cran --no-manual
#

Now the status is "Status: 1 NOTE", and the note is:
"Package listed in more than one of Depends, Imports, Suggests, Enhances:
    ‘stats’
A package should be listed in only one of these fields."

Possible fix:
1) I think the highlighted sentence in Writing R Extensions should 
read as:

"A package or ‘R’ can appear more than once in the ‘Depends’ field, for
example to give upper and lower bounds on acceptable versions. For
packages, the same rule applies for ‘Imports’ and ‘Suggests’ fields (see
later)."

2) In .check_package_description2(),
'unique(allpkgs[duplicated(allpkgs)])' shall be replaced with a more
elaborated check. BTW, that check appears twice in the function, where
the first result is assigned to 'out' and is never used until 'out' gets
re-assigned. See
https://github.c

Re: [Rd] removeSource() vs. function literals

2023-03-31 Thread Dénes Tóth



On 3/31/23 08:49, Lionel Henry via R-devel wrote:

If you can afford a dependency on rlang, `rlang::zap_srcref()` deals
with this. It's recursive over expression vectors, calls (including
calls to `function` and their hidden srcref arg), and function
objects. It's implemented in C for efficiency as we found it to be a
bottleneck in some applications (IIRC caching). I'd be happy to
upstream this in base if R core is interested.


That would be very helpful. When having to implement caching, I have 
been hit by this issue several times in the past, too (before 
rlang::zap_srcref() existed).


Regards,
Denes





Best,
Lionel


On 3/30/23, Duncan Murdoch  wrote:

On 30/03/2023 10:32 a.m., Ivan Krylov wrote:

Dear R-devel,

In a package of mine, I use removeSource on expression objects in order
to make expressions that are semantically the same serialize to the
same byte sequences:
https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34

Today I learned that expressions containing function definitions also
contain the source references for the functions, not as an attribute,
but as a separate argument to the `function` call:

str(quote(function() NULL)[[4]])
# 'srcref' int [1:8] 1 11 1 25 11 25 1 1
# - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
#   

This means that removeSource() on an expression that would define a
function when evaluated doesn't actually remove the source reference
from the object.

Do you think it would be appropriate to teach removeSource() to remove
such source references? What could be a good way to implement that?
if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL
sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds
too broad.



I don't think there's a simple way to do that.  Functions can define
functions within themselves.  If you're talking about code that was
constructed by messing with language objects, it could contain both
function objects and calls to `function` to construct them.  You'd need
to recurse through all expressions in the object.  Some of those
expressions might be environments, so your changes could leak out of the
function you're working on.

Things are simpler if you know the expression is the unmodified result
of parsing source code, but if you know that, wouldn't you usually be
able to control things by setting keep.source = FALSE?

Maybe a workable solution is something like parse(deparse(expr, control
= "exact"), keep.source = FALSE).  Wouldn't work on environments or
various exotic types, but would probably warn you if it wasn't working.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Package required but not available: ‘arrow’

2024-02-22 Thread Dénes Tóth
Depending on your use case you can also take a look at the nanoarrow 
package (https://cran.r-project.org/package=nanoarrow). Maybe it 
provides all the features you need and has a much smaller footprint than 
'arrow'.


Best,
Denes


On 2/22/24 10:01, Duncan Murdoch wrote:
If you look on the CRAN check results for arrow, you'll see it has 
errors on the Linux platforms that use clang, and can't be installed there.


For you to deal with this, you should make arrow into a suggested 
package, and if it is missing, work around that without generating an 
error.  Another choice would be to work with the arrow developers to get 
it to install on the systems where it fails now, but it's a big package, 
so that would likely be a lot harder.


Duncan Murdoch

On 21/02/2024 5:15 p.m., Park, Sung Jae wrote:

Hi,

I’m writing to seek assistance regarding an issue we’re encountering 
during the submission process of our new package to CRAN.
The package in question is currently working smoothly on R CMD check 
on Windows; however, we are facing a specific error when running R CMD 
check on Debian. The error message we’ve got from CRAN is as follows:


```
❯ checking package dependencies ... ERROR
   Package required but not available: ‘arrow’

   See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’
manual.
```

We have ensured that the ‘arrow’ package is properly listed in 
DESCRIPTION file under the ‘Imports:’.
Could you please provide guidance on how to resolve this? Any help 
will be valuable.


Thank you in advance.

Best,
--Sungjae



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel