On Mon, Nov 10, 2014 at 12:35 PM, Dr Gregory Jefferis <jeffe...@mrc-lmb.cam.ac.uk> wrote: > Dear R-devel, > > Can anyone help me to understand this? It seems that subscripting the rows > of a data.frame without actually changing their order, somehow changes an > internal representation of row.names that is revealed by e.g. > dput/dump/serialize > > I have read the docs and inspected the (R) code for data.frame, rownames, > row.names and dput without enlightenment. > Look at ?.row_names_info (which is mentioned in the See Also section of ?row.names) and its type argument. Also see the discussion here: http://stackoverflow.com/q/26468746/271616
> df=data.frame(a=1:10, b=1) > dput(df) > df2=df[1:nrow(df), ] > # R thinks they are equal (so do I!) > all.equal(df, df2) > dput(df2) > > Looking at the output of the dputs > >> dput(df) > > structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = > c("a", > "b"), row.names = c(NA, -10L), class = "data.frame") >> >> dput(df2) > > structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = > c("a", > "b"), row.names = c(NA, 10L), class = "data.frame") > > we have row.names = c(NA, -10L) in the first case and row.names = c(NA, 10L) > in the second, so somehow these objects have a different representation > > Can anyone explain why? This has come up because > The first are "automatic". The second are a compact form of 1:10, as mentioned in ?row.names. I'm not certain of the root cause/reason, but the second object will not have "automatic" rownames because you have subset it with a non-missing 'i'. >> library(digest) >> digest(df)==digest(df2) > > [1] FALSE > > digest uses serialize under the hood, but serialize, dput and dump all show > the same effect (I've pasted an example below using dump, md5sum from base > R). > > Many thanks for any enlightenment! More generally is there any way to > calculate a digest of a data.frame that could get round this issue or is > that not possible? > > Best wishes, > > Greg. > > > A digest using base R: > > library(tools) > td=tempfile() > dir.create(td) > tempfiles=file.path(td,c("df", "df2")) > dump("df",tempfiles[1]) > dump("df2",tempfiles[2]) > md5sum(tempfiles) > > # different md5sum > >> sessionInfo() # for my laptop but also observed on R 3.1.2 > > R version 3.1.1 (2014-07-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] nat_1.5.14 nat.utils_0.4.2 digest_0.6.4 Rvcg_0.9 > devtools_1.6.1 igraph_0.7.1 > [7] testthat_0.9.1 rgl_0.93.1098 > > loaded via a namespace (and not attached): > [1] codetools_0.2-9 filehash_2.2-2 nabor_0.4.3 parallel_3.1.1 > plyr_1.8.1 > [6] Rcpp_0.11.3 rstudio_0.98.1062 rstudioapi_0.1 XML_3.98-1.1 > yaml_2.1.13 > > -- > Gregory Jefferis, PhD > Division of Neurobiology > MRC Laboratory of Molecular Biology > Francis Crick Avenue > Cambridge Biomedical Campus > Cambridge, CB2 OQH, UK > > http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis > http://jefferislab.org > http://flybrain.stanford.edu > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel