Hi Mathieu, I don't have a full explanation, but here is some additional observations:
> options(digits = 4) > > ## Simplified example > df2 <- data.frame(x = rnorm(21), y = rnorm(21), id = 99990:100010) > apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE)) [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > > ## Based on magnitude of id (> 9994 gets padded regardless of position) > df2 <- data.frame(x = rnorm(21), y = rnorm(21), id = 100010:99990) > apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE)) [1] "100010" "100009" "100008" "100007" "100006" "100005" "100004" "100003" "100002" "100001" "100000" " 99999" " 99998" " 99997" [15] " 99996" " 99995" "99994" "99993" "99992" "99991" "99990" > > ## The issue is that formatting a double leads to the originally noted > behavior. > ## The apply version coerces df2 to a matrix of type double which is why this > ## happens there as well. > > for(i in 1:nrow(df2)) print(format(df2[i, "id"], scientific=FALSE)) [1] "100010" [1] "100009" [1] "100008" [1] "100007" [1] "100006" [1] "100005" [1] "100004" [1] "100003" [1] "100002" [1] "100001" [1] "100000" [1] "99999" [1] "99998" [1] "99997" [1] "99996" [1] "99995" [1] "99994" [1] "99993" [1] "99992" [1] "99991" [1] "99990" > for(i in 1:nrow(df2)) print(format(as.double(df2[i, "id"]), scientific=FALSE)) [1] "100010" [1] "100009" [1] "100008" [1] "100007" [1] "100006" [1] "100005" [1] "100004" [1] "100003" [1] "100002" [1] "100001" [1] "100000" [1] " 99999" [1] " 99998" [1] " 99997" [1] " 99996" [1] " 99995" [1] "99994" [1] "99993" [1] "99992" [1] "99991" [1] "99990" Best, Ista On Thu, Aug 1, 2013 at 11:31 AM, Mathieu Basille <basille....@ase-research.org> wrote: > This problem does not seem to be widely popular, but at least affects two > users (both on Linux, maybe a hint here?). To me, it looks like a bug (is it > a R bug, or a OS-related bug, I don't know). Should I forward it to R-devel, > or some other place where R gurus may have a chance to look at it? > > Mathieu. > > > Le 07/30/2013 02:34 PM, arun a écrit : > >> Hi Mathieu >> yes, the original problem occurs in my system too. I am using R 3.0.1 on >> linux mint 15. I guess the default case would be trim=FALSE, but still it >> looks very strange especially in ?apply(), as it starts from " 99995" >> onwards. >> >> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 >> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] stringr_0.6.2 reshape2_1.2.2 >> >> loaded via a namespace (and not attached): >> [1] plyr_1.8 tools_3.0.1 >> >> >> >> >> >> >> >> >> ----- Original Message ----- >> From: Mathieu Basille <basille....@ase-research.org> >> To: arun <smartpink...@yahoo.com> >> Cc: R help <r-help@r-project.org> >> Sent: Tuesday, July 30, 2013 2:29 PM >> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on >> 'options(digits = K)' >> >> Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms >> of the problem, and this is the solution I'm currently using. However, it >> does not help to understand what the problem is, and what is the cause of >> it. >> >> Can you confirm that the original problem also occurs on your computer >> (and >> what is your OS)? It would be interesting since David is not able to >> reproduce the problem with Mac OS X. >> Mathieu. >> >> >> Le 07/30/2013 02:15 PM, arun a écrit : >>> >>> Hi, >>> Try using trim=TRUE, in ?format() >>> options(digits=4) >>> >>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], >>> trim=TRUE,scientific = FALSE)) >>> df2$id2[99990:100010] >>> # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >>> "99997" >>> # [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" >>> "100005" >>> #[17] "100006" "100007" "100008" "100009" "100010" >>> >>> >>> id2 <- format(1:110000, scientific = FALSE,trim=TRUE) >>> id2[99990:100010] >>> # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >>> "99997" >>> #[9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" >>> "100005" >>> #[17] "100006" "100007" "100008" "100009" "100010" >>> A.K. >>> >>> >>> ----- Original Message ----- >>> From: Mathieu Basille <basille....@ase-research.org> >>> To: David Winsemius <dwinsem...@comcast.net> >>> Cc: r-help@r-project.org >>> Sent: Tuesday, July 30, 2013 2:07 PM >>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on >>> 'options(digits = K)' >>> >>> Thanks David for your interest. I have to admit that your answer puzzles >>> me >>> even more than before. It seems that the underlying problem is way beyond >>> my R skills... >>> >>> The generation of id2 is indeed quite demanding, especially compared to a >>> simple 'as.character' call. Anyway, since it seems to be system specific, >>> here is the sessionInfo() that I forgot to attach to my first message: >>> >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 >>> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> In brief: last stable R available under Debian Testing... Hopefully this >>> can help tracking down the problem. >>> Mathieu. >>> >>> >>> Le 07/30/2013 01:58 PM, David Winsemius a écrit : >>>> >>>> >>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: >>>> >>>>> Dear list, >>>>> >>>>> Here is a simple example in which the behaviour of 'format' does not >>>>> make sense to me. I have read the documentation and searched the archives, >>>>> but nothing pointed me in the right direction to understand this >>>>> behaviour. >>>>> Let's start with a simple data frame: >>>>> >>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >>>>> >>>>> Let's now create a new variable 'id2' which is the character >>>>> representation of 'id'. Note that I use 'scientific = FALSE' to ensure >>>>> that >>>>> long numbers such as 100,000 are not formatted using their scientific >>>>> representation (in this case 1e+05): >>>>> >>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = >>>>> FALSE)) >>>>> >>>>> Let's have a look at part of the result: >>>>> >>>>> df1$id2[99990:100010] >>>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >>>>> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" >>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" >>>> >>>> >>>> Some formating processes are carried out by system functions. In this >>>> case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 >>>> Patched >>>> >>>>> df1$id2[99990:100010] >>>> >>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >>>> "99997" >>>> [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" >>>> "100005" >>>> [17] "100006" "100007" "100008" "100009" "100010" >>>> >>>> (I did notice that generation of the id2 variable seemed to take an >>>> inordinately long time.) >>>> >>>> -- David. >>>>> >>>>> >>>>> So far, so good. Let's now play with the 'digits' option: >>>>> >>>>> options(digits = 4) >>>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >>>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = >>>>> FALSE)) >>>>> df2$id2[99990:100010] >>>>> [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" >>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" >>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" >>>>> >>>>> Notice the extra leading space from 99995 to 99999? To make sure it >>>>> only happened there: >>>>> >>>>> df2$id2[which(df1$id2 != df2$id2)] >>>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999" >>>>> >>>>> And just to make sure it only occurs in a 'apply' call, here is the >>>>> same directly on a numeric vector: >>>>> >>>>> id2 <- format(1:110000, scientific = FALSE) >>>>> id2[99990:100010] >>>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996" >>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" >>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" >>>>> >>>>> Here the leading spaces are for every number, which makes sense to me. >>>>> Is there anything I'm misinterpreting in the behaviour of 'format'? >>>>> Thanks in advance for any hint, >>>>> Mathieu. >>>>> >>>>> >>>>> PS: Some background for this question. It all comes from a Rmd >>>>> document, that knitr consistently failed to process, while the R code was >>>>> fine using batch or interactive R. knitr uses 'options(digits = 4)' as >>>>> opposed to 'options(digits = 7)' by default in R, which made one of my >>>>> function throw an error with knitr, but not with batch or interactive R. I >>>>> managed to solve the problem using 'trim = TRUE' in 'format', but I still >>>>> do >>>>> not understand what's going on... >>>>> If you're interested, see here for more details on the original >>>>> problem: >>>>> http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176 >>>>> >>>>> >>>>> -- >>>>> >>>>> ~$ whoami >>>>> Mathieu Basille, PhD >>>>> >>>>> ~$ locate --details >>>>> University of Florida \\ >>>>> Fort Lauderdale Research and Education Center >>>>> (+1) 954-577-6314 >>>>> http://ase-research.org/basille >>>>> >>>>> ~$ fortune >>>>> « Le tout est de tout dire, et je manque de mots >>>>> Et je manque de temps, et je manque d'audace. » >>>>> -- Paul Éluard >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> David Winsemius >>>> Alameda, CA, USA >>>> >>> >>> >>> >>>> >>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: >>>> >>>>> Dear list, >>>>> >>>>> Here is a simple example in which the behaviour of 'format' does not >>>>> make sense to me. I have read the documentation and searched the archives, >>>>> but nothing pointed me in the right direction to understand this >>>>> behaviour. >>>>> Let's start with a simple data frame: >>>>> >>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) >>>>> >>>>> Let's now create a new variable 'id2' which is the character >>>>> representation of 'id'. Note that I use 'scientific = FALSE' to ensure >>>>> that >>>>> long numbers such as 100,000 are not formatted using their scientific >>>>> representation (in this case 1e+05): >>>>> >>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = >>>>> FALSE)) >>>>> >>>>> Let's have a look at part of the result: >>>>> >>>>> df1$id2[99990:100010] >>>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >>>>> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" >>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" >>>> >>>> >>>> Some formating processes are carried out by system functions. In this >>>> case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 >>>> Patched >>>> >>>>> df1$id2[99990:100010] >>>> >>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" >>>> "99997" >>>> [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" >>>> "100005" >>>> [17] "100006" "100007" "100008" "100009" "100010" >>>> >>>> (I did notice that generation of the id2 variable seemed to take an >>>> inordinately long time.) >>>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.