Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera Mon, 03 Sep 2018 07:50:05 -0700

On 09/03/2018 03:59 PM, Dénes Tóth wrote:

Hi Tomas,
On 09/03/2018 11:49 AM, Tomas Kalibera wrote:
Please don't do this to get the underlying vector length (or toachieve anything else). Setting/deleting attributes of an R objectwithout checking the reference count violates R semantics, which inturn can have unpredictable results on R programs (essentiallyundebuggable segfaults now or more likely later when newoptimizations or features are added to the language). Settingattributes on objects with reference count (currently NAMED value)greater than 0 (in some special cases 1 is ok) is cheating - pleasesee Writing R Extensions - and getting speedups via cheating leads tofragile, unmaintainable and buggy code.

Hi Denes,

Please note that data.table::setattr is an exported function of awidely used package (available from CRAN), which also has adescription in ?data.table::setattr why it might be useful.

indeed, and not your fault, but the function is cheating and that it isin a widely used package, even exported from it, does not make it anysafer. The related optimization in base R (shallow copying) mentioned inthe documentation of data.table::setattr is on the other hand sound, itdoes not break the semantics.

Of course one has to use set* functions from data.table with extremecare, but if one does it in the right way, they can help a lot. Forexample there is no real danger of using them in internal functionswhere one can control what is get passed to the function or createdwithin the function (so when one knows that the refcount==0 conditionis true).

Extreme care is not enough as the internals can and do change (and withthe limits given by documentation, they are likely to change soon wrt toNAMED/reference counting), not mentioning that they are verycomplicated. The approach of "modify in place because we know thereference count is 0" is particularly error prone and unnecessary. It isunnecessary because there is documented C API for legitimate use inpackages to find out whether an object may be referenced/shared(indirectly checks the reference count). If not, it can be modified inplace without cheating, and some packages do it. It is error pronebecause the reference count can change due to many things packagedevelopers cannot be expected to know (and again, these things change):in set* functions for example, it will never be 0 (!), these functionswith their current API can never be implemented in current R withoutbreaking the semantics.

In principle one can do similar things legitimately by wrapping objectsin an environment, passing such environment (environments canlegitimately be modified in place), checking the contained objects havereference count of 1 (not shared), and if so, modifying them in place.But indeed, as soon as such objects become shared, there is no way out,one has to copy (in the current R).


Best
Tomas

(Notwithstanding the above, but also supporting you argumentation, ittook me hours to debug a particular problem in one of my internalpackages, see https://github.com/Rdatatable/data.table/issues/1281)
In the present case, an important and unanswered question is (citedfrom Henrik):
>>> However, I'm concerned that calling unclass(x) may trigger an
>>> expensive copy internally in some cases.  Is that concern unfounded?
If no copy is made, length(unclass(x)) beats length(setattr(..)) inall scenarios.
Doing so in packages is particularly unhelpful to the whole community- packages should only use the public API as documented.
Similarly, getting a physical address of an object to hack aroundwhether R has copied it or not should certainly not be done inpackages and R code should never be working with or even obtainingphysical address of an object. This is also why one cannot obtainsuch address using base R (apart in textual form from certaindiagnostic messages where it can indeed be useful for low-leveldebugging).
Getting the physical address of the object was done exclusively fordemonstration purposes. I totally agree that is should not be used forthe purpose you described and I have never ever done so.
Regards,
Denes
Tomas

On 09/02/2018 01:19 AM, Dénes Tóth wrote:
The solution below introduces a dependency on data.table, butotherwise it does what you need:
---

# special method for Foo objects
length.Foo <- function(x) {
  length(unlist(x, recursive = TRUE, use.names = FALSE))
}

# an instance of a Foo object
x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")

# its length
stopifnot(length(x) == 3L)

# get its length as if it were a standard list
.length <- function(x) {
  cls <- class(x)
  # setattr() does not make a copy, but modifies by reference
  data.table::setattr(x, "class", NULL)
  # get the length
  len <- base::length(x)
  # re-set original classes
  data.table::setattr(x, "class", cls)
  # return the unclassed length
  len
}

# to check that we do not make unwanted changes
orig_class <- class(x)

# check that the address in RAM does not change
a1 <- data.table::address(x)

# 'unclassed' length
stopifnot(.length(x) == 2L)

# check that address is the same
stopifnot(a1 == data.table::address(x))

# check against original class
stopifnot(identical(orig_class, class(x)))

---


On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:

.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Thxs,

Henrik

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

Reply via email to