On 22/07/2020 3:29 p.m., Pan Domu wrote:
I ran into strange behavior when removing names.

Two ways of removing names:

     i <- rep(1:4, length.out=20000)
     k <- c(a=1, b=2, c=3, d=4)

     x1 <- unname(k[i])
     x2 <- k[i]
     x2 <- unname(x2)

Are they identical?

     identical(x1,x2) # TRUE

but no

     identical(serialize(x1,NULL),serialize(x2,NULL)) # FALSE

But problem is with serialization type 3, cause:

     identical(serialize(x1,NULL,version = 2),serialize(x2,NULL,version =
2)) # TRUE

It seems that the second one keeps names somewhere invisibly.

Some function can lost them, e.g. head:

     identical(serialize(head(x1, 20001),NULL),serialize(head(x2,
20001),NULL)) # TRUE

But not saveRDS (so files are bigger), tibble family keeps them but base
data.frame seems to drop them.

 From my test invisible names are in following cases:

    x1 <- k[i] %>% unname()
    x3 <- k[i]; x3 <- unname(x3)
    x5 <- k[i]; x5 <- `names<-`(x5, NULL)
    x6 <- k[i]; x6 <- unname(x6)

but not in this one
    x2 <- unname(k[i])
    x4 <- k[i]; names(x4) <- NULL

What kind of magick is that?

It hits us when we upgrade from 3.5 (when serialization changed) and had
impact on parallelization (cause serialized objects were bigger).

You can use .Internal(inspect(x1)) and .Internal(inspect(x2)) to see that the two objects are not identical:

> .Internal(inspect(x1))
@1116b7000 14 REALSXP g0c7 [REF(2)] (len=20000, tl=0) 1,2,3,4,1,...
> .Internal(inspect(x2))
@7f9c77664ce8 14 REALSXP g0c0 [REF(2)]  wrapper [srt=-2147483648,no_na=0]
  @10e7b7000 14 REALSXP g0c7 [REF(6),ATT] (len=20000, tl=0) 1,2,3,4,1,...
  ATTRIB:
    @7f9c77664738 02 LISTSXP g0c0 [REF(1)]
TAG: @7f9c6c027890 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "names" (has value)
      @10e3ac000 16 STRSXP g0c7 [REF(65535)] (len=20000, tl=0)
@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII] [cached] "a" @7f9c6ae9a678 09 CHARSXP g1c1 [MARK,REF(10013),gp=0x61] [ASCII] [cached] "b" @7f9c6c0496c0 09 CHARSXP g1c1 [MARK,REF(10568),gp=0x61,ATT] [ASCII] [cached] "c" @7f9c6ad3df40 09 CHARSXP g1c1 [MARK,REF(10029),gp=0x61,ATT] [ASCII] [cached] "d" @7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII] [cached] "a"
        ...


It looks as though x2 is a tiny ALTREP object acting as a wrapper on the original k[i], but I might be misinterpreting those displays. I don't know how to force ALTREP objects to standard representation: unserializing the serialized x2 gives something like x2, not like x1. Maybe you want to look at one of the contributed low level packages. The stringfish package has a "materialize" function that is advertised to convert anything to standard format, but it doesn't change x2.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to