Re: [R] Plot for 10 years extrapolation
Incidentally, if all you wanted to do was plot fitted values, the predict method is kinda overkill, as it's just the fitted line from the model. But I assume you wanted to plot CI's/PI's also, as the example illustrated. -- Bert On Thu, Oct 26, 2023 at 1:56 PM Bert Gunter wrote: > > from ?predict.lm: > > "predict.lm produces a vector of predictions or a matrix of > predictions and bounds with column names fit, lwr, and upr if interval > is set. " > > ergo: > predict(model, dfuture, interval = "prediction")[,"fit"] ## or [,1] > as it's the first column in the returned matrix > > is your vector of predicted values that you can plot against > dfuture$date however you would like, e.g. with different colors, > symbols, or whatever. Exactly how you do this depends on what graphics > package you are using. The example in ?predict.lm shows you how to do > it with R's base graphics and overlaying prediction and confidence > intervals. > > Cheers, > Bert > > On Thu, Oct 26, 2023 at 11:27 AM varin sacha via R-help > wrote: > > > > Dear R-Experts, > > > > Here below my R code working but I don't know how to complete/finish my R > > code to get the final plot with the extrapolation for the10 more years. > > > > Indeed, I try to extrapolate my data with a linear fit over the next 10 > > years. So I create a date sequence for the next 10 years and store as a > > dataframe to make the prediction possible. > > Now, I am trying to get the plot with the actual data (from year 2004 to > > 2018) and with the 10 more years extrapolation. > > > > Thanks for your help. > > > > > > date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", > > "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", > > "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", > > "2004-12-31")) > > > > value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, > > 10087, 14987, 11098, 13421, 9023, 12098) > > > > model <- lm(value~date) > > > > plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot") > > abline(model,col="darkorange",lwd=2) > > > > dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", > > length.out=10)) > > > > predict(model,dfuture,interval="prediction") > > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot for 10 years extrapolation
from ?predict.lm: "predict.lm produces a vector of predictions or a matrix of predictions and bounds with column names fit, lwr, and upr if interval is set. " ergo: predict(model, dfuture, interval = "prediction")[,"fit"] ## or [,1] as it's the first column in the returned matrix is your vector of predicted values that you can plot against dfuture$date however you would like, e.g. with different colors, symbols, or whatever. Exactly how you do this depends on what graphics package you are using. The example in ?predict.lm shows you how to do it with R's base graphics and overlaying prediction and confidence intervals. Cheers, Bert On Thu, Oct 26, 2023 at 11:27 AM varin sacha via R-help wrote: > > Dear R-Experts, > > Here below my R code working but I don't know how to complete/finish my R > code to get the final plot with the extrapolation for the10 more years. > > Indeed, I try to extrapolate my data with a linear fit over the next 10 > years. So I create a date sequence for the next 10 years and store as a > dataframe to make the prediction possible. > Now, I am trying to get the plot with the actual data (from year 2004 to > 2018) and with the 10 more years extrapolation. > > Thanks for your help. > > > date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", > "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", > "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", > "2004-12-31")) > > value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, > 10087, 14987, 11098, 13421, 9023, 12098) > > model <- lm(value~date) > > plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot") > abline(model,col="darkorange",lwd=2) > > dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", > length.out=10)) > > predict(model,dfuture,interval="prediction") > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help to resolve the Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution halted
Hmm, I can't replicate (i.e., it works fine for me). What are the results of your sessionInfo() (from a *clean* R session)? == R Under development (unstable) (2023-10-25 r85410) Platform: x86_64-pc-linux-gnu Running under: Pop!_OS 22.04 LTS Matrix products: default BLAS: /usr/local/lib/R/lib/libRblas.so LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.10.0 attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] desc_1.4.2R6_2.5.1 bspm_0.5.3 remotes_2.4.2.1 [5] ps_1.7.5 cli_3.6.1 processx_3.8.2callr_3.7.3 [9] compiler_4.4.0prettyunits_1.2.0 rprojroot_2.0.3 tools_4.4.0 [13] pkgbuild_1.4.2curl_5.0.1crayon_1.5.2 On 2023-10-26 9:59 a.m., siddharth sahasrabudhe via R-help wrote: Hello Colleagues, I am trying to get the Git repository using *remotes* package. I am using *remotes::install_github("dcl-docs/dcldata") *to get the Git repo. However, I am getting the following error message. I have absolutely no idea on what this error message means and how to get away with this. Can anyone please suggest the way out...? Here is the error message that i am getting: remotes::install_github("dcl-docs/dcldata")Downloading GitHub repo dcl-docs/dcldata@HEADRunning `R CMD build`...* checking for file 'C:\Users\admin\AppData\Local\Temp\RtmpWmVvZI\remotes2ddc1b982b8\dcl-docs-dcldata-0a08cbb/DESCRIPTION' ... OK * preparing 'dcldata': * checking DESCRIPTION meta-information ... OK * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * building 'dcldata_0.1.2.9000.tar.gz'Installing package into ‘C:/Users/admin/AppData/Local/R/win-library/4.3’ (as ‘lib’ is unspecified)Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution haltedWarning: installation of package ‘C:/Users/admin/AppData/Local/Temp/RtmpWmVvZI/file2ddc7bed1881/dcldata_0.1.2.9000.tar.gz’ had non-zero exit status Many thanks for the prompt help as always! Regards Siddharth -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help to resolve the Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution halted
Hello Colleagues, I am trying to get the Git repository using *remotes* package. I am using *remotes::install_github("dcl-docs/dcldata") *to get the Git repo. However, I am getting the following error message. I have absolutely no idea on what this error message means and how to get away with this. Can anyone please suggest the way out...? Here is the error message that i am getting: > remotes::install_github("dcl-docs/dcldata")Downloading GitHub repo > dcl-docs/dcldata@HEADRunning `R CMD build`...* checking for file > 'C:\Users\admin\AppData\Local\Temp\RtmpWmVvZI\remotes2ddc1b982b8\dcl-docs-dcldata-0a08cbb/DESCRIPTION' > ... OK * preparing 'dcldata': * checking DESCRIPTION meta-information ... OK * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * building 'dcldata_0.1.2.9000.tar.gz'Installing package into ‘C:/Users/admin/AppData/Local/R/win-library/4.3’ (as ‘lib’ is unspecified)Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution haltedWarning: installation of package ‘C:/Users/admin/AppData/Local/Temp/RtmpWmVvZI/file2ddc7bed1881/dcldata_0.1.2.9000.tar.gz’ had non-zero exit status Many thanks for the prompt help as always! Regards Siddharth -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot for 10 years extrapolation
Dear R-Experts, Here below my R code working but I don't know how to complete/finish my R code to get the final plot with the extrapolation for the10 more years. Indeed, I try to extrapolate my data with a linear fit over the next 10 years. So I create a date sequence for the next 10 years and store as a dataframe to make the prediction possible. Now, I am trying to get the plot with the actual data (from year 2004 to 2018) and with the 10 more years extrapolation. Thanks for your help. date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", "2004-12-31")) value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 10087, 14987, 11098, 13421, 9023, 12098) model <- lm(value~date) plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot") abline(model,col="darkorange",lwd=2) dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", length.out=10)) predict(model,dfuture,interval="prediction") __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inquiry about bandwidth rescaling in Ksmooth
Apologies in advance if my comments don't help, in which case, no need to respond, but I noted in ?ksmooth: "bandwidth the bandwidth. The kernels are scaled so that their quartiles (viewed as probability densities) are at ± 0.25*bandwidth." So, could this be a source of the discrepancies you cited? Given that ?ksmooth explicitly says: "Note: This function was implemented for compatibility with S, although it is nowhere near as slow as the S function. Better kernel smoothers are available in other packages such as KernSmooth." One wonder why you bother with it at all? (That was rhetorical -- do not answer). Cheers, Bert On Thu, Oct 26, 2023 at 11:06 AM Jan Failenschmid via R-help wrote: > > Dear Sir, Madam, or to whom this may concern, > > my name is Jan Failenschmid and I am a Ph.D. student at Tilburg University. > For my project I have been looking into different types of kernel regression > estimators and corresponding R functions. > While comparing different functions I noticed that stats::ksmooth returned > different estimates for the same bandwidth > as other kernel regression estimators that should be equivalent (i.e. the > local polynomial estimators KernSmooth::locpoly and > locpol::locpol with degree 0). However, when optimizing the bandwidth of > ksmooth separately using the same loss function, I find comparable estimates > to the other two estimators for a (larger) different bandwidth. To confirm > this, I wrote my own Nadaraya-Watson kernel regression estimator, which is > consistent with the two local polynomial estimators and shows the same > discordance with ksmooth. > > This led me to the suspicion that the bandwidth that is passed to kmooth is > rescaled or transformed within the function. Unfortunately, I was not able to > confirm this with either the code of the function or the documentation. It > would be of great help to me if you could clarify this for me. > > Thank you very much for your time and help in advance. > > Kind regards, > > Jan Failenschmid > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in print for data frames?
Dear R users! Thank you for your excellent replies. I didn't know that the print.data.frame expands matrix-like values in this way. Why doesn't it call the column in my example C.A? I understand that something like that happens when the data.frame in position three has multiple columns. But your answers have helped me understand this better. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inquiry about bandwidth rescaling in Ksmooth
Dear Sir, Madam, or to whom this may concern, my name is Jan Failenschmid and I am a Ph.D. student at Tilburg University. For my project I have been looking into different types of kernel regression estimators and corresponding R functions. While comparing different functions I noticed that stats::ksmooth returned different estimates for the same bandwidth as other kernel regression estimators that should be equivalent (i.e. the local polynomial estimators KernSmooth::locpoly and locpol::locpol with degree 0). However, when optimizing the bandwidth of ksmooth separately using the same loss function, I find comparable estimates to the other two estimators for a (larger) different bandwidth. To confirm this, I wrote my own Nadaraya-Watson kernel regression estimator, which is consistent with the two local polynomial estimators and shows the same discordance with ksmooth. This led me to the suspicion that the bandwidth that is passed to kmooth is rescaled or transformed within the function. Unfortunately, I was not able to confirm this with either the code of the function or the documentation. It would be of great help to me if you could clarify this for me. Thank you very much for your time and help in advance. Kind regards, Jan Failenschmid [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Yext in parentheses.
I recommend cutting snippets out of your code by stopping the code at the point of interest and using dput() to pull out "data as it is" before the troublesome section and then using the reprex package to test that the snippet runs. Either you will notice the problem on your own while taking this time, or you will end up with something we can actually try out. Your current approach fails to provide everything we might need precisely because you don't know what you need to know to solve your problem. If you have privacy issues with the data then once you have a reprex you can make a shareable reprex by replacing the private data with generic data of the same type. Also, you should often experiment to reduce the size of your sample data to a minimum necessary to reproduce the problem, as this will often remove unnecessary confusion as well. Remember, you are trying to learn how to solve your problems on your own anyway, so this is not wasted time even if you do end up solving it on your own. On October 26, 2023 9:18:10 AM PDT, Steven Yen wrote: >Dear All > >My program is long and sorry I do not have a replicable set of codes to >present. But I present a chunk of codes at the end below. Essentially, > >1. I initialize cat.ref as NUL (see line 1) > >2. Then, I repeatedly add elements to cat.ref, where each element >include parentheses in double quotations (see line 2). > >I had expected cat.ref to eventually contain the list > >dilemma1(ref),scigrn1(ref),... > >Not so, I end up getting the following (see first column; not in >parentheses, like (ref.)). > >dilemma1.ref.. 22.356 2.619 8.535 0.000 *** scigrn1.ref.. 22.474 2.697 >8.334 0.000 *** > >Any idea how I might revise lines like the following (first line below): > >dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)" > >etc. Thanks. > > ap0<-zx.ref<-NULL > > dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)" > if(any(dv.group%in%jindex)){ > v<-pred0(dv.group,cat.ref) > ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref) > } > > dv.group<-c("scigrn2","scigrn3"); cat.ref<-"scigrn1(ref.)" > if(any(dv.group%in%jindex)){ > v<-pred0(dv.group,cat.ref) > ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref) > } > > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Yext in parentheses.
Hi, It isn't at all clear to me what you're trying to do. For one thing, you never actually add more items to cat.ref in the code snippet you give. You do use c() on zx.ref - is that what you mean? But you aren't adding cat.ref to it, you're adding v$cat.ref, and I have no idea what that might contain. Lots of things, apparently. It's possible that something in pred0 is stripping things out of cat.ref, I suppose, since c() alone won't do that. > cat.ref<-"dilemma1(ref.)" > cat.ref <- c(cat.ref, "scigrn1(ref.)") > cat.ref [1] "dilemma1(ref.)" "scigrn1(ref.)" I think we need a reproducible example showing the actual problem, or at least very clear explanation. For instance, what's dput(v$cat.ref) Is it the same as cat.ref, or is that where the alteration happens? Sarah On Thu, Oct 26, 2023 at 12:18 PM Steven Yen wrote: > > Dear All > > My program is long and sorry I do not have a replicable set of codes to > present. But I present a chunk of codes at the end below. Essentially, > > 1. I initialize cat.ref as NUL (see line 1) > > 2. Then, I repeatedly add elements to cat.ref, where each element > include parentheses in double quotations (see line 2). > > I had expected cat.ref to eventually contain the list > > dilemma1(ref),scigrn1(ref),... > > Not so, I end up getting the following (see first column; not in > parentheses, like (ref.)). > > dilemma1.ref.. 22.356 2.619 8.535 0.000 *** scigrn1.ref.. 22.474 2.697 > 8.334 0.000 *** > > Any idea how I might revise lines like the following (first line below): > > dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)" > > etc. Thanks. > >ap0<-zx.ref<-NULL > >dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)" >if(any(dv.group%in%jindex)){ > v<-pred0(dv.group,cat.ref) > ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref) >} > >dv.group<-c("scigrn2","scigrn3"); cat.ref<-"scigrn1(ref.)" >if(any(dv.group%in%jindex)){ > v<-pred0(dv.group,cat.ref) > ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref) >} > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee (she/her) http://www.numberwright.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Yext in parentheses.
Dear All My program is long and sorry I do not have a replicable set of codes to present. But I present a chunk of codes at the end below. Essentially, 1. I initialize cat.ref as NUL (see line 1) 2. Then, I repeatedly add elements to cat.ref, where each element include parentheses in double quotations (see line 2). I had expected cat.ref to eventually contain the list dilemma1(ref),scigrn1(ref),... Not so, I end up getting the following (see first column; not in parentheses, like (ref.)). dilemma1.ref.. 22.356 2.619 8.535 0.000 *** scigrn1.ref.. 22.474 2.697 8.334 0.000 *** Any idea how I might revise lines like the following (first line below): dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)" etc. Thanks. ap0<-zx.ref<-NULL dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)" if(any(dv.group%in%jindex)){ v<-pred0(dv.group,cat.ref) ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref) } dv.group<-c("scigrn2","scigrn3"); cat.ref<-"scigrn1(ref.)" if(any(dv.group%in%jindex)){ v<-pred0(dv.group,cat.ref) ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in print for data frames?
Hello, Inline. Às 13:32 de 26/10/2023, Ebert,Timothy Aaron escreveu: The "problem" goes away if you use x$C <- y[1,] Actually, if I understand correctly, the OP wants the column: x$C <- y[,1] In this case it will produce the same output because y is a df with only one row. But that is a very special case, the general case would be to extract the column. Hope this helps, Rui Barradas If you have another row in your x, say: x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6)) then your code x$C <- y[1] returns an error. If y has the same number of rows as x$C then R has the same outcome as in your example. It looks like your code tells R to replace all of column C (including the name) with all of vector y. Maybe unexpected, but not a bug. It is consistent. -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Thursday, October 26, 2023 6:43 AM To: Christian Asseburg ; r-help@r-project.org Subject: Re: [R] Bug in print for data frames? [External Email] Às 07:18 de 25/10/2023, Christian Asseburg escreveu: Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1: x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1) x A B C 1 1 2 3 x$B <- y$A # works as expected x A B C 1 1 1 3 x$C <- y[1] # makes C disappear x A B A 1 1 1 1 str(x) 'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer. Thanks. With best wishes - . . . Christian __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat/ .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu %7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84 %7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D=0 PLEASE do read the posting guide http://www.r/ -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C23 7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0% 7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=FN CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D=0 and provide commented, minimal, self-contained, reproducible code. Hello, To expand on the good answers already given, I will present two other example data sets. Example 1. Imagine that instead of assigning just one column from y to x$C you assign two columns. The result is a data.frame column. See what is displayed as the columns names. And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[` doesn't work. You will have to extract the columns y$A and y$B one by one. x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1, B = 4) str(y) #> 'data.frame':1 obs. of 2 variables: #> $ A: num 1 #> $ B: num 4 x$C <- y[1:2] x #> A B C.A C.B #> 1 1 2 1 4 str(x) #> 'data.frame':1 obs. of 3 variables: #> $ A: num 1 #> $ B: num 2 #> $ C:'data.frame': 1 obs. of 2 variables: #> ..$ A: num 1 #> ..$ B: num 4 x[[1:2]] # doesn't work #> Error in .subset2(x, i, exact = exact): subscript out of bounds Example 2. Sometimes it is usefull to get a result like this first and then correct the resulting df. For instance, when computing more than one summary statistics. str(agg) below shows that the result summary stats is a matrix, so you have a column-matrix. And once again the displayed names reflect that. The trick to make the result a df is to extract all but the last column as a sub-df, extract the last column's values as a matrix (which it is) and then cbind the two together. cbind is a generic function. Since the first argument to cbind is a sub-df, the method called is cbind.data.frame and the result is a df. df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30) # the anonymous function computes more than one summary statistics # note that it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x))) agg #> AX.Mean X.S #> 1 a 14.50 9.082951 #> 2 b 15.50 9.082951 #> 3 c 16.50 9.082951 # similar effect as in the OP, The difference is that the last # column is a matrix, not a data.frame str(agg) #> 'data.frame':3 obs. of 2 variables: #> $ A: chr "a" "b" "c" #> $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:2] "Mean" "S" # nc is just a
Re: [R] Bug in print for data frames?
The "problem" goes away if you use x$C <- y[1,] If you have another row in your x, say: x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6)) then your code x$C <- y[1] returns an error. If y has the same number of rows as x$C then R has the same outcome as in your example. It looks like your code tells R to replace all of column C (including the name) with all of vector y. Maybe unexpected, but not a bug. It is consistent. -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Thursday, October 26, 2023 6:43 AM To: Christian Asseburg ; r-help@r-project.org Subject: Re: [R] Bug in print for data frames? [External Email] Às 07:18 de 25/10/2023, Christian Asseburg escreveu: > Hi! I came across this unexpected behaviour in R. First I thought it was a > bug in the assignment operator <- but now I think it's maybe a bug in the way > data frames are being printed. What do you think? > > Using R 4.3.1: > >> x <- data.frame(A = 1, B = 2, C = 3) >> y <- data.frame(A = 1) >> x >A B C > 1 1 2 3 >> x$B <- y$A # works as expected >> x >A B C > 1 1 1 3 >> x$C <- y[1] # makes C disappear >> x >A B A > 1 1 1 1 >> str(x) > 'data.frame': 1 obs. of 3 variables: > $ A: num 1 > $ B: num 1 > $ C:'data.frame': 1 obs. of 1 variable: >..$ A: num 1 > > Why does the print(x) not show "C" as the name of the third element? I did > mess up the data frame (and this was a mistake on my part), but finding the > bug was harder because print(x) didn't show the C any longer. > > Thanks. With best wishes - > > . . . Christian > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat/ > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu > %7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84 > %7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D=0 > PLEASE do read the posting guide > http://www.r/ > -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C23 > 7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0% > 7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=FN > CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D=0 > and provide commented, minimal, self-contained, reproducible code. Hello, To expand on the good answers already given, I will present two other example data sets. Example 1. Imagine that instead of assigning just one column from y to x$C you assign two columns. The result is a data.frame column. See what is displayed as the columns names. And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[` doesn't work. You will have to extract the columns y$A and y$B one by one. x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1, B = 4) str(y) #> 'data.frame':1 obs. of 2 variables: #> $ A: num 1 #> $ B: num 4 x$C <- y[1:2] x #> A B C.A C.B #> 1 1 2 1 4 str(x) #> 'data.frame':1 obs. of 3 variables: #> $ A: num 1 #> $ B: num 2 #> $ C:'data.frame': 1 obs. of 2 variables: #> ..$ A: num 1 #> ..$ B: num 4 x[[1:2]] # doesn't work #> Error in .subset2(x, i, exact = exact): subscript out of bounds Example 2. Sometimes it is usefull to get a result like this first and then correct the resulting df. For instance, when computing more than one summary statistics. str(agg) below shows that the result summary stats is a matrix, so you have a column-matrix. And once again the displayed names reflect that. The trick to make the result a df is to extract all but the last column as a sub-df, extract the last column's values as a matrix (which it is) and then cbind the two together. cbind is a generic function. Since the first argument to cbind is a sub-df, the method called is cbind.data.frame and the result is a df. df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30) # the anonymous function computes more than one summary statistics # note that it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x))) agg #> AX.Mean X.S #> 1 a 14.50 9.082951 #> 2 b 15.50 9.082951 #> 3 c 16.50 9.082951 # similar effect as in the OP, The difference is that the last # column is a matrix, not a data.frame str(agg) #> 'data.frame':3 obs. of 2 variables: #> $ A: chr "a" "b" "c" #> $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:2] "Mean" "S" # nc is just a convenience, avoids repeated calls to ncol nc <- ncol(agg) cbind(agg[-nc], agg[[nc]]) #> A MeanS #> 1 a 14.5 9.082951 #> 2 b 15.5 9.082951 #> 3 c 16.5 9.082951 # all is well cbind(agg[-nc], agg[[nc]]) |> str() #> 'data.frame':3 obs. of 3 variables: #> $ A
Re: [R] Bug in print for data frames?
Às 07:18 de 25/10/2023, Christian Asseburg escreveu: Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1: x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1) x A B C 1 1 2 3 x$B <- y$A # works as expected x A B C 1 1 1 3 x$C <- y[1] # makes C disappear x A B A 1 1 1 1 str(x) 'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer. Thanks. With best wishes - . . . Christian __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To expand on the good answers already given, I will present two other example data sets. Example 1. Imagine that instead of assigning just one column from y to x$C you assign two columns. The result is a data.frame column. See what is displayed as the columns names. And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[` doesn't work. You will have to extract the columns y$A and y$B one by one. x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1, B = 4) str(y) #> 'data.frame':1 obs. of 2 variables: #> $ A: num 1 #> $ B: num 4 x$C <- y[1:2] x #> A B C.A C.B #> 1 1 2 1 4 str(x) #> 'data.frame':1 obs. of 3 variables: #> $ A: num 1 #> $ B: num 2 #> $ C:'data.frame': 1 obs. of 2 variables: #> ..$ A: num 1 #> ..$ B: num 4 x[[1:2]] # doesn't work #> Error in .subset2(x, i, exact = exact): subscript out of bounds Example 2. Sometimes it is usefull to get a result like this first and then correct the resulting df. For instance, when computing more than one summary statistics. str(agg) below shows that the result summary stats is a matrix, so you have a column-matrix. And once again the displayed names reflect that. The trick to make the result a df is to extract all but the last column as a sub-df, extract the last column's values as a matrix (which it is) and then cbind the two together. cbind is a generic function. Since the first argument to cbind is a sub-df, the method called is cbind.data.frame and the result is a df. df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30) # the anonymous function computes more than one summary statistics # note that it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x))) agg #> AX.Mean X.S #> 1 a 14.50 9.082951 #> 2 b 15.50 9.082951 #> 3 c 16.50 9.082951 # similar effect as in the OP, The difference is that the last # column is a matrix, not a data.frame str(agg) #> 'data.frame':3 obs. of 2 variables: #> $ A: chr "a" "b" "c" #> $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:2] "Mean" "S" # nc is just a convenience, avoids repeated calls to ncol nc <- ncol(agg) cbind(agg[-nc], agg[[nc]]) #> A MeanS #> 1 a 14.5 9.082951 #> 2 b 15.5 9.082951 #> 3 c 16.5 9.082951 # all is well cbind(agg[-nc], agg[[nc]]) |> str() #> 'data.frame':3 obs. of 3 variables: #> $ A : chr "a" "b" "c" #> $ Mean: num 14.5 15.5 16.5 #> $ S : num 9.08 9.08 9.08 If the anonymous function hadn't returned a named vetor, the new column names would have been "1". "2", try it. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in print for data frames?
On 25/10/2023 2:18 a.m., Christian Asseburg wrote: Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1: x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1) x A B C 1 1 2 3 x$B <- y$A # works as expected x A B C 1 1 1 3 x$C <- y[1] # makes C disappear x A B A 1 1 1 1 str(x) 'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer. y[1] is a dataframe with one column, i.e. it is identical to y. To get the result you expected, you should have used y[[1]], to extract column 1. Since dataframes are lists, you can assign them as columns of other dataframes, and you'll create a single column in the result whose rows are the columns of the dataframe you're assigning. This means that x$C <- y[1] replaces the C column of x with a dataframe. It retains the name C (you can see this if you print names(x) ), but since the column contains a dataframe, it chooses to use the column name of y when printing. If you try x$D <- x you'll see it generate new names when printing, but the names within x remain as A, B, C, D. This is a situation where tibbles do a better job than dataframes: if you created x and y as tibbles instead of dataframes and executed your code, you'd see this: library(tibble) x <- tibble(A = 1, B = 2, C = 3) y <- tibble(A = 1) x$C <- y[1] x #> # A tibble: 1 × 3 #> A B C$A #> #> 1 1 2 1 Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in print for data frames?
I would say this is not an error, but I think what you wrote isn't what you intended to do anyway. y[1] is a data.frame which contains only the first column of y, which you assign to x$C, so now x$C is a data.frame. R allows data.frame to be plain vectors as well as matrices and data.frames, basically anything as long as it has the correct length or nrow. When the data.frame is formatted for printing, each column C is formatted then column-bound into another data.frame using as.data.frame.list, so it takes the name A because that's the name of the column from y. I think what you meant to do is x$C <- y[[1]] ## double brackets instead of single On Thu, Oct 26, 2023 at 4:14 AM Christian Asseburg wrote: > > Hi! I came across this unexpected behaviour in R. First I thought it was a > bug in the assignment operator <- but now I think it's maybe a bug in the way > data frames are being printed. What do you think? > > Using R 4.3.1: > > > x <- data.frame(A = 1, B = 2, C = 3) > > y <- data.frame(A = 1) > > x > A B C > 1 1 2 3 > > x$B <- y$A # works as expected > > x > A B C > 1 1 1 3 > > x$C <- y[1] # makes C disappear > > x > A B A > 1 1 1 1 > > str(x) > 'data.frame': 1 obs. of 3 variables: > $ A: num 1 > $ B: num 1 > $ C:'data.frame': 1 obs. of 1 variable: > ..$ A: num 1 > > Why does the print(x) not show "C" as the name of the third element? I did > mess up the data frame (and this was a mistake on my part), but finding the > bug was harder because print(x) didn't show the C any longer. > > Thanks. With best wishes - > > . . . Christian > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug in print for data frames?
Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1: > x <- data.frame(A = 1, B = 2, C = 3) > y <- data.frame(A = 1) > x A B C 1 1 2 3 > x$B <- y$A # works as expected > x A B C 1 1 1 3 > x$C <- y[1] # makes C disappear > x A B A 1 1 1 1 > str(x) 'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer. Thanks. With best wishes - . . . Christian __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.