On 22/07/2020 3:29 p.m., Pan Domu wrote:
I ran into strange behavior when removing names.
Two ways of removing names:
i <- rep(1:4, length.out=20000)
k <- c(a=1, b=2, c=3, d=4)
x1 <- unname(k[i])
x2 <- k[i]
x2 <- unname(x2)
Are they identical?
identical(x1,x2) # TRUE
but no
identical(serialize(x1,NULL),serialize(x2,NULL)) # FALSE
But problem is with serialization type 3, cause:
identical(serialize(x1,NULL,version = 2),serialize(x2,NULL,version =
2)) # TRUE
It seems that the second one keeps names somewhere invisibly.
Some function can lost them, e.g. head:
identical(serialize(head(x1, 20001),NULL),serialize(head(x2,
20001),NULL)) # TRUE
But not saveRDS (so files are bigger), tibble family keeps them but base
data.frame seems to drop them.
From my test invisible names are in following cases:
x1 <- k[i] %>% unname()
x3 <- k[i]; x3 <- unname(x3)
x5 <- k[i]; x5 <- `names<-`(x5, NULL)
x6 <- k[i]; x6 <- unname(x6)
but not in this one
x2 <- unname(k[i])
x4 <- k[i]; names(x4) <- NULL
What kind of magick is that?
It hits us when we upgrade from 3.5 (when serialization changed) and had
impact on parallelization (cause serialized objects were bigger).
You can use .Internal(inspect(x1)) and .Internal(inspect(x2)) to see
that the two objects are not identical:
> .Internal(inspect(x1))
@1116b7000 14 REALSXP g0c7 [REF(2)] (len=20000, tl=0) 1,2,3,4,1,...
> .Internal(inspect(x2))
@7f9c77664ce8 14 REALSXP g0c0 [REF(2)] wrapper [srt=-2147483648,no_na=0]
@10e7b7000 14 REALSXP g0c7 [REF(6),ATT] (len=20000, tl=0) 1,2,3,4,1,...
ATTRIB:
@7f9c77664738 02 LISTSXP g0c0 [REF(1)]
TAG: @7f9c6c027890 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000]
"names" (has value)
@10e3ac000 16 STRSXP g0c7 [REF(65535)] (len=20000, tl=0)
@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII]
[cached] "a"
@7f9c6ae9a678 09 CHARSXP g1c1 [MARK,REF(10013),gp=0x61] [ASCII]
[cached] "b"
@7f9c6c0496c0 09 CHARSXP g1c1 [MARK,REF(10568),gp=0x61,ATT] [ASCII]
[cached] "c"
@7f9c6ad3df40 09 CHARSXP g1c1 [MARK,REF(10029),gp=0x61,ATT] [ASCII]
[cached] "d"
@7f9c6ab531c8 09 CHARSXP g1c1 [MARK,REF(10066),gp=0x61] [ASCII]
[cached] "a"
...
It looks as though x2 is a tiny ALTREP object acting as a wrapper on the
original k[i], but I might be misinterpreting those displays. I don't
know how to force ALTREP objects to standard representation:
unserializing the serialized x2 gives something like x2, not like x1.
Maybe you want to look at one of the contributed low level packages.
The stringfish package has a "materialize" function that is advertised
to convert anything to standard format, but it doesn't change x2.
Duncan Murdoch
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel