Re: [R] Plot for 10 years extrapolation

2023-10-26 Thread Bert Gunter
Incidentally, if all you wanted to do was plot fitted values, the
predict method is kinda overkill, as it's just the fitted line from
the model. But I assume you wanted to plot CI's/PI's also, as the
example illustrated.

-- Bert

On Thu, Oct 26, 2023 at 1:56 PM Bert Gunter  wrote:
>
> from ?predict.lm:
>
> "predict.lm produces a vector of predictions or a matrix of
> predictions and bounds with column names fit, lwr, and upr if interval
> is set. "
>
> ergo:
> predict(model, dfuture, interval = "prediction")[,"fit"]  ## or [,1]
> as it's the first column in the returned matrix
>
> is your vector of predicted values that you can plot against
> dfuture$date however you would like, e.g. with different colors,
> symbols, or whatever. Exactly how you do this depends on what graphics
> package you are using. The example in ?predict.lm shows you how to do
> it with R's base graphics and overlaying prediction and confidence
> intervals.
>
> Cheers,
> Bert
>
> On Thu, Oct 26, 2023 at 11:27 AM varin sacha via R-help
>  wrote:
> >
> > Dear R-Experts,
> >
> > Here below my R code working but I don't know how to complete/finish my R 
> > code to get the final plot with the extrapolation for the10 more years.
> >
> > Indeed, I try to extrapolate my data with a linear fit over the next 10 
> > years. So I create a date sequence for the next 10 years and store as a 
> > dataframe to make the prediction possible.
> > Now, I am trying to get the plot with the actual data (from year 2004 to 
> > 2018) and with the 10 more years extrapolation.
> >
> > Thanks for your help.
> >
> > 
> > date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", 
> > "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", 
> > "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", 
> > "2004-12-31"))
> >
> > value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 
> > 10087, 14987, 11098, 13421, 9023, 12098)
> >
> > model <- lm(value~date)
> >
> > plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot")
> > abline(model,col="darkorange",lwd=2)
> >
> > dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", 
> > length.out=10))
> >
> > predict(model,dfuture,interval="prediction")
> > 
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot for 10 years extrapolation

2023-10-26 Thread Bert Gunter
from ?predict.lm:

"predict.lm produces a vector of predictions or a matrix of
predictions and bounds with column names fit, lwr, and upr if interval
is set. "

ergo:
predict(model, dfuture, interval = "prediction")[,"fit"]  ## or [,1]
as it's the first column in the returned matrix

is your vector of predicted values that you can plot against
dfuture$date however you would like, e.g. with different colors,
symbols, or whatever. Exactly how you do this depends on what graphics
package you are using. The example in ?predict.lm shows you how to do
it with R's base graphics and overlaying prediction and confidence
intervals.

Cheers,
Bert

On Thu, Oct 26, 2023 at 11:27 AM varin sacha via R-help
 wrote:
>
> Dear R-Experts,
>
> Here below my R code working but I don't know how to complete/finish my R 
> code to get the final plot with the extrapolation for the10 more years.
>
> Indeed, I try to extrapolate my data with a linear fit over the next 10 
> years. So I create a date sequence for the next 10 years and store as a 
> dataframe to make the prediction possible.
> Now, I am trying to get the plot with the actual data (from year 2004 to 
> 2018) and with the 10 more years extrapolation.
>
> Thanks for your help.
>
> 
> date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", 
> "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", 
> "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", 
> "2004-12-31"))
>
> value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 
> 10087, 14987, 11098, 13421, 9023, 12098)
>
> model <- lm(value~date)
>
> plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot")
> abline(model,col="darkorange",lwd=2)
>
> dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", 
> length.out=10))
>
> predict(model,dfuture,interval="prediction")
> 
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help to resolve the Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution halted

2023-10-26 Thread Ben Bolker
  Hmm, I can't replicate (i.e., it works fine for me).  What are the 
results of your sessionInfo() (from a *clean* R session)?


==
R Under development (unstable) (2023-10-25 r85410)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; 
LAPACK version 3.10.0




attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

loaded via a namespace (and not attached):
 [1] desc_1.4.2R6_2.5.1  bspm_0.5.3 
remotes_2.4.2.1
 [5] ps_1.7.5  cli_3.6.1 processx_3.8.2callr_3.7.3 

 [9] compiler_4.4.0prettyunits_1.2.0 rprojroot_2.0.3   tools_4.4.0 


[13] pkgbuild_1.4.2curl_5.0.1crayon_1.5.2


On 2023-10-26 9:59 a.m., siddharth sahasrabudhe via R-help wrote:

Hello Colleagues,

I am trying to get the Git repository using *remotes* package. I am
using *remotes::install_github("dcl-docs/dcldata")
*to get the Git repo.

However, I am getting the following error message. I have absolutely no
idea on what this error message means and how to get away with this. Can
anyone please suggest the way out...?

Here is the error message that i am getting:


remotes::install_github("dcl-docs/dcldata")Downloading GitHub repo 
dcl-docs/dcldata@HEADRunning `R CMD build`...* checking for file 
'C:\Users\admin\AppData\Local\Temp\RtmpWmVvZI\remotes2ddc1b982b8\dcl-docs-dcldata-0a08cbb/DESCRIPTION'
 ... OK

* preparing 'dcldata':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'dcldata_0.1.2.9000.tar.gz'Installing package into
‘C:/Users/admin/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)Error: evaluation nested too deeply: infinite
recursion / options(expressions=)?
Execution haltedWarning: installation of package
‘C:/Users/admin/AppData/Local/Temp/RtmpWmVvZI/file2ddc7bed1881/dcldata_0.1.2.9000.tar.gz’
had non-zero exit status


Many thanks for the prompt help as always!

Regards
Siddharth
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help to resolve the Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Execution halted

2023-10-26 Thread siddharth sahasrabudhe via R-help
Hello Colleagues,

I am trying to get the Git repository using *remotes* package. I am
using *remotes::install_github("dcl-docs/dcldata")
*to get the Git repo.

However, I am getting the following error message. I have absolutely no
idea on what this error message means and how to get away with this. Can
anyone please suggest the way out...?

Here is the error message that i am getting:

> remotes::install_github("dcl-docs/dcldata")Downloading GitHub repo 
> dcl-docs/dcldata@HEADRunning `R CMD build`...* checking for file 
> 'C:\Users\admin\AppData\Local\Temp\RtmpWmVvZI\remotes2ddc1b982b8\dcl-docs-dcldata-0a08cbb/DESCRIPTION'
>  ... OK
* preparing 'dcldata':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'dcldata_0.1.2.9000.tar.gz'Installing package into
‘C:/Users/admin/AppData/Local/R/win-library/4.3’
(as ‘lib’ is unspecified)Error: evaluation nested too deeply: infinite
recursion / options(expressions=)?
Execution haltedWarning: installation of package
‘C:/Users/admin/AppData/Local/Temp/RtmpWmVvZI/file2ddc7bed1881/dcldata_0.1.2.9000.tar.gz’
had non-zero exit status


Many thanks for the prompt help as always!

Regards
Siddharth
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot for 10 years extrapolation

2023-10-26 Thread varin sacha via R-help
Dear R-Experts,

Here below my R code working but I don't know how to complete/finish my R code 
to get the final plot with the extrapolation for the10 more years.

Indeed, I try to extrapolate my data with a linear fit over the next 10 years. 
So I create a date sequence for the next 10 years and store as a dataframe to 
make the prediction possible.
Now, I am trying to get the plot with the actual data (from year 2004 to 2018) 
and with the 10 more years extrapolation.

Thanks for your help.


date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", 
"2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", 
"2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", 
"2004-12-31"))
 
value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 10087, 
14987, 11098, 13421, 9023, 12098)
 
model <- lm(value~date)
 
plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot")
abline(model,col="darkorange",lwd=2)
 
dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", 
length.out=10))
 
predict(model,dfuture,interval="prediction")


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inquiry about bandwidth rescaling in Ksmooth

2023-10-26 Thread Bert Gunter
Apologies in advance if my comments don't help, in which case, no need
to respond,  but I noted in ?ksmooth:

"bandwidth
the bandwidth. The kernels are scaled so that their quartiles (viewed
as probability densities) are at ± 0.25*bandwidth." So, could this be
a source of the discrepancies you cited?

Given that ?ksmooth explicitly says:

"Note:
This function was implemented for compatibility with S, although it is
nowhere near as slow as the S function. Better kernel smoothers are
available in other packages such as KernSmooth."

One wonder why you bother with it at all? (That was rhetorical -- do
not answer).

Cheers,
Bert

On Thu, Oct 26, 2023 at 11:06 AM Jan Failenschmid via R-help
 wrote:
>
> Dear Sir, Madam, or to whom this may concern,
>
> my name is Jan Failenschmid and I am a Ph.D. student at Tilburg University.
> For my project I have been looking into different types of kernel regression 
> estimators and corresponding R functions.
> While comparing different functions I noticed that stats::ksmooth returned 
> different estimates for the same bandwidth
> as other kernel regression estimators that should be equivalent (i.e. the 
> local polynomial estimators KernSmooth::locpoly and
> locpol::locpol with degree 0). However, when optimizing the bandwidth of 
> ksmooth separately using the same loss function, I find comparable estimates 
> to the other two estimators for a (larger) different bandwidth. To confirm 
> this, I wrote my own Nadaraya-Watson kernel regression estimator, which is 
> consistent with the two local polynomial estimators and shows the same 
> discordance with ksmooth.
>
> This led me to the suspicion that the bandwidth that is passed to kmooth is 
> rescaled or transformed within the function. Unfortunately, I was not able to 
> confirm this with either the code of the function or the documentation. It 
> would be of great help to me if you could clarify this for me.
>
> Thank you very much for your time and help in advance.
>
> Kind regards,
>
> Jan Failenschmid
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in print for data frames?

2023-10-26 Thread Christian Asseburg
Dear R users! Thank you for your excellent replies. I didn't know that the 
print.data.frame expands matrix-like values in this way. Why doesn't it call 
the column in my example C.A? I understand that something like that happens 
when the data.frame in position three has multiple columns. But your answers 
have helped me understand this better.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inquiry about bandwidth rescaling in Ksmooth

2023-10-26 Thread Jan Failenschmid via R-help
Dear Sir, Madam, or to whom this may concern,

my name is Jan Failenschmid and I am a Ph.D. student at Tilburg University.
For my project I have been looking into different types of kernel regression 
estimators and corresponding R functions.
While comparing different functions I noticed that stats::ksmooth returned 
different estimates for the same bandwidth
as other kernel regression estimators that should be equivalent (i.e. the local 
polynomial estimators KernSmooth::locpoly and
locpol::locpol with degree 0). However, when optimizing the bandwidth of 
ksmooth separately using the same loss function, I find comparable estimates to 
the other two estimators for a (larger) different bandwidth. To confirm this, I 
wrote my own Nadaraya-Watson kernel regression estimator, which is consistent 
with the two local polynomial estimators and shows the same discordance with 
ksmooth.

This led me to the suspicion that the bandwidth that is passed to kmooth is 
rescaled or transformed within the function. Unfortunately, I was not able to 
confirm this with either the code of the function or the documentation. It 
would be of great help to me if you could clarify this for me.

Thank you very much for your time and help in advance.

Kind regards,

Jan Failenschmid

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Yext in parentheses.

2023-10-26 Thread Jeff Newmiller via R-help
I recommend cutting snippets out of your code by stopping the code at the point 
of interest and using dput() to pull out "data as it is" before the troublesome 
section and then using the reprex package to test that the snippet runs.

Either you will notice the problem on your own while taking this time, or you 
will end up with something we can actually try out. Your current approach fails 
to provide everything we might need precisely because you don't know what you 
need to know to solve your problem.

If you have privacy issues with the data then once you have a reprex you can 
make a shareable reprex by replacing the private data with generic data of the 
same type. Also, you should often experiment to reduce the size of your sample 
data to a minimum necessary to reproduce the problem, as this will often remove 
unnecessary confusion as well.

Remember, you are trying to learn how to solve your problems on your own 
anyway, so this is not wasted time even if you do end up solving it on your own.

On October 26, 2023 9:18:10 AM PDT, Steven Yen  wrote:
>Dear All
>
>My program is long and sorry I do not have a replicable set of codes to 
>present. But I present a chunk of codes at the end below. Essentially,
>
>1. I initialize cat.ref as NUL (see line 1)
>
>2. Then, I repeatedly add elements to cat.ref, where each element 
>include parentheses in double quotations (see line 2).
>
>I had expected cat.ref to eventually contain the list
>
>dilemma1(ref),scigrn1(ref),...
>
>Not so, I end up getting the following (see first column; not in 
>parentheses, like (ref.)).
>
>dilemma1.ref.. 22.356 2.619 8.535 0.000 *** scigrn1.ref.. 22.474 2.697 
>8.334 0.000 ***
>
>Any idea how I might revise lines like the following (first line below):
>
>dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)"
>
>etc. Thanks.
>
>   ap0<-zx.ref<-NULL
>
>   dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)"
>   if(any(dv.group%in%jindex)){
>     v<-pred0(dv.group,cat.ref)
>     ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref)
>   }
>
>   dv.group<-c("scigrn2","scigrn3"); cat.ref<-"scigrn1(ref.)"
>   if(any(dv.group%in%jindex)){
>     v<-pred0(dv.group,cat.ref)
>     ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref)
>   }
>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Yext in parentheses.

2023-10-26 Thread Sarah Goslee
Hi,

It isn't at all clear to me what you're trying to do. For one thing,
you never actually add more items to cat.ref in the code snippet you
give.
You do use c() on zx.ref - is that what you mean?

But you aren't adding cat.ref to it, you're adding v$cat.ref, and I
have no idea what that might contain. Lots of things, apparently.

It's possible that something in pred0 is stripping things out of
cat.ref, I suppose, since c() alone won't do that.

> cat.ref<-"dilemma1(ref.)"
> cat.ref <- c(cat.ref, "scigrn1(ref.)")
> cat.ref
[1] "dilemma1(ref.)" "scigrn1(ref.)"

I think we need a reproducible example showing the actual problem, or
at least very clear explanation.

For instance, what's
dput(v$cat.ref)

Is it the same as cat.ref, or is that where the alteration happens?

Sarah

On Thu, Oct 26, 2023 at 12:18 PM Steven Yen  wrote:
>
> Dear All
>
> My program is long and sorry I do not have a replicable set of codes to
> present. But I present a chunk of codes at the end below. Essentially,
>
> 1. I initialize cat.ref as NUL (see line 1)
>
> 2. Then, I repeatedly add elements to cat.ref, where each element
> include parentheses in double quotations (see line 2).
>
> I had expected cat.ref to eventually contain the list
>
> dilemma1(ref),scigrn1(ref),...
>
> Not so, I end up getting the following (see first column; not in
> parentheses, like (ref.)).
>
> dilemma1.ref.. 22.356 2.619 8.535 0.000 *** scigrn1.ref.. 22.474 2.697
> 8.334 0.000 ***
>
> Any idea how I might revise lines like the following (first line below):
>
> dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)"
>
> etc. Thanks.
>
>ap0<-zx.ref<-NULL
>
>dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)"
>if(any(dv.group%in%jindex)){
>  v<-pred0(dv.group,cat.ref)
>  ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref)
>}
>
>dv.group<-c("scigrn2","scigrn3"); cat.ref<-"scigrn1(ref.)"
>if(any(dv.group%in%jindex)){
>  v<-pred0(dv.group,cat.ref)
>  ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref)
>}
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Sarah Goslee (she/her)
http://www.numberwright.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Yext in parentheses.

2023-10-26 Thread Steven Yen
Dear All

My program is long and sorry I do not have a replicable set of codes to 
present. But I present a chunk of codes at the end below. Essentially,

1. I initialize cat.ref as NUL (see line 1)

2. Then, I repeatedly add elements to cat.ref, where each element 
include parentheses in double quotations (see line 2).

I had expected cat.ref to eventually contain the list

dilemma1(ref),scigrn1(ref),...

Not so, I end up getting the following (see first column; not in 
parentheses, like (ref.)).

dilemma1.ref.. 22.356 2.619 8.535 0.000 *** scigrn1.ref.. 22.474 2.697 
8.334 0.000 ***

Any idea how I might revise lines like the following (first line below):

dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)"

etc. Thanks.

   ap0<-zx.ref<-NULL

   dv.group<-c("dilemma2","dilemma3"); cat.ref<-"dilemma1(ref.)"
   if(any(dv.group%in%jindex)){
     v<-pred0(dv.group,cat.ref)
     ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref)
   }

   dv.group<-c("scigrn2","scigrn3"); cat.ref<-"scigrn1(ref.)"
   if(any(dv.group%in%jindex)){
     v<-pred0(dv.group,cat.ref)
     ap0<-rbind(ap0,v$ap0); zx.ref<-c(zx.ref,v$cat.ref)
   }


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in print for data frames?

2023-10-26 Thread Rui Barradas

Hello,

Inline.

Às 13:32 de 26/10/2023, Ebert,Timothy Aaron escreveu:

The "problem" goes away if you use

x$C <- y[1,]


Actually, if I understand correctly, the OP wants the column:


x$C <- y[,1]


In this case it will produce the same output because y is a df with only 
one row. But that is a very special case, the general case would be to 
extract the column.


Hope this helps,

Rui Barradas



If you have another row in your x, say:
x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6))

then your code
x$C <- y[1]
returns an error.

If y has the same number of rows as x$C then R has the same outcome as in your 
example.

It looks like your code tells R to replace all of column C (including the name) 
with all of vector y.

Maybe unexpected, but not a bug. It is consistent.


-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Thursday, October 26, 2023 6:43 AM
To: Christian Asseburg ; r-help@r-project.org
Subject: Re: [R] Bug in print for data frames?

[External Email]

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

A B C
1 1 2 3

x$B <- y$A # works as expected
x

A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
   $ A: num 1
   $ B: num 1
   $ C:'data.frame':  1 obs. of  1 variable:
..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat/
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu
%7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84
%7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D=0
PLEASE do read the posting guide
http://www.r/
-project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C23
7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=FN
CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D=0
and provide commented, minimal, self-contained, reproducible code.

Hello,

To expand on the good answers already given, I will present two other example 
data sets.

Example 1. Imagine that instead of assigning just one column from y to x$C you 
assign two columns. The result is a data.frame column. See what is displayed as 
the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the operator 
`[[` doesn't work. You will have to extract the columns y$A and y$B one by one.



x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and then 
correct the resulting df. For instance, when computing more than one summary 
statistics.

str(agg)  below shows that the result summary stats is a matrix, so you have a 
column-matrix. And once again the displayed names reflect that.

The trick to make the result a df is to extract all but the last column as a 
sub-df, extract the last column's values as a matrix (which it is) and then 
cbind the two together.

cbind is a generic function. Since the first argument to cbind is a sub-df, the 
method called is cbind.data.frame and the result is a df.



df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics # note that it 
returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = 
sd(x))) agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that the last # column is a 
matrix, not a data.frame
str(agg)
#> 'data.frame':3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a 

Re: [R] Bug in print for data frames?

2023-10-26 Thread Ebert,Timothy Aaron
The "problem" goes away if you use

x$C <- y[1,]

If you have another row in your x, say:
x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6))

then your code
x$C <- y[1]
returns an error.

If y has the same number of rows as x$C then R has the same outcome as in your 
example.

It looks like your code tells R to replace all of column C (including the name) 
with all of vector y.

Maybe unexpected, but not a bug. It is consistent.


-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Thursday, October 26, 2023 6:43 AM
To: Christian Asseburg ; r-help@r-project.org
Subject: Re: [R] Bug in print for data frames?

[External Email]

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:
> Hi! I came across this unexpected behaviour in R. First I thought it was a 
> bug in the assignment operator <- but now I think it's maybe a bug in the way 
> data frames are being printed. What do you think?
>
> Using R 4.3.1:
>
>> x <- data.frame(A = 1, B = 2, C = 3)
>> y <- data.frame(A = 1)
>> x
>A B C
> 1 1 2 3
>> x$B <- y$A # works as expected
>> x
>A B C
> 1 1 1 3
>> x$C <- y[1] # makes C disappear
>> x
>A B A
> 1 1 1 1
>> str(x)
> 'data.frame':   1 obs. of  3 variables:
>   $ A: num 1
>   $ B: num 1
>   $ C:'data.frame':  1 obs. of  1 variable:
>..$ A: num 1
>
> Why does the print(x) not show "C" as the name of the third element? I did 
> mess up the data frame (and this was a mistake on my part), but finding the 
> bug was harder because print(x) didn't show the C any longer.
>
> Thanks. With best wishes -
>
> . . . Christian
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu
> %7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D=0
> PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C23
> 7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=FN
> CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D=0
> and provide commented, minimal, self-contained, reproducible code.
Hello,

To expand on the good answers already given, I will present two other example 
data sets.

Example 1. Imagine that instead of assigning just one column from y to x$C you 
assign two columns. The result is a data.frame column. See what is displayed as 
the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the operator 
`[[` doesn't work. You will have to extract the columns y$A and y$B one by one.



x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and then 
correct the resulting df. For instance, when computing more than one summary 
statistics.

str(agg)  below shows that the result summary stats is a matrix, so you have a 
column-matrix. And once again the displayed names reflect that.

The trick to make the result a df is to extract all but the last column as a 
sub-df, extract the last column's values as a matrix (which it is) and then 
cbind the two together.

cbind is a generic function. Since the first argument to cbind is a sub-df, the 
method called is cbind.data.frame and the result is a df.



df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics # note that 
it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S 
= sd(x))) agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that the last # column is a 
matrix, not a data.frame
str(agg)
#> 'data.frame':3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a convenience, avoids repeated calls to ncol nc <- ncol(agg) 
cbind(agg[-nc], agg[[nc]])
#>   A MeanS
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951

# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame':3 obs. of  3 variables:
#>  $ A 

Re: [R] Bug in print for data frames?

2023-10-26 Thread Rui Barradas

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

   A B C
1 1 2 3

x$B <- y$A # works as expected
x

   A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

   A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
  $ A: num 1
  $ B: num 1
  $ C:'data.frame':  1 obs. of  1 variable:
   ..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To expand on the good answers already given, I will present two other 
example data sets.


Example 1. Imagine that instead of assigning just one column from y to 
x$C you assign two columns. The result is a data.frame column. See what 
is displayed as the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the 
operator `[[` doesn't work. You will have to extract the columns y$A and 
y$B one by one.




x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and 
then correct the resulting df. For instance, when computing more than 
one summary statistics.


str(agg)  below shows that the result summary stats is a matrix, so you 
have a column-matrix. And once again the displayed names reflect that.


The trick to make the result a df is to extract all but the last column 
as a sub-df, extract the last column's values as a matrix (which it is) 
and then cbind the two together.


cbind is a generic function. Since the first argument to cbind is a 
sub-df, the method called is cbind.data.frame and the result is a df.




df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics
# note that it returns a named vector
agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x)))
agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that the last
# column is a matrix, not a data.frame
str(agg)
#> 'data.frame':3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a convenience, avoids repeated calls to ncol
nc <- ncol(agg)
cbind(agg[-nc], agg[[nc]])
#>   A MeanS
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951

# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame':3 obs. of  3 variables:
#>  $ A   : chr  "a" "b" "c"
#>  $ Mean: num  14.5 15.5 16.5
#>  $ S   : num  9.08 9.08 9.08



If the anonymous function hadn't returned a named vetor, the new column 
names would have been "1". "2", try it.



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in print for data frames?

2023-10-26 Thread Duncan Murdoch

On 25/10/2023 2:18 a.m., Christian Asseburg wrote:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

   A B C
1 1 2 3

x$B <- y$A # works as expected
x

   A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

   A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
  $ A: num 1
  $ B: num 1
  $ C:'data.frame':  1 obs. of  1 variable:
   ..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.


y[1] is a dataframe with one column, i.e. it is identical to y.  To get 
the result you expected, you should have used y[[1]], to extract column 1.


Since dataframes are lists, you can assign them as columns of other 
dataframes, and you'll create a single column in the result whose rows 
are the columns of the dataframe you're assigning.  This means that


 x$C <- y[1]

replaces the C column of x with a dataframe.  It retains the name C (you 
can see this if you print names(x) ), but since the column contains a 
dataframe, it chooses to use the column name of y when printing.


If you try

 x$D <- x

you'll see it generate new names when printing, but the names within x 
remain as A, B, C, D.


This is a situation where tibbles do a better job than dataframes:  if 
you created x and y as tibbles instead of dataframes and executed your 
code, you'd see this:


  library(tibble)
  x <- tibble(A = 1, B = 2, C = 3)
  y <- tibble(A = 1)
  x$C <- y[1]
  x
  #> # A tibble: 1 × 3
  #>   A B   C$A
  #> 
  #> 1 1 2 1

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in print for data frames?

2023-10-26 Thread Iris Simmons
I would say this is not an error, but I think what you wrote isn't
what you intended to do anyway.

y[1] is a data.frame which contains only the first column of y, which
you assign to x$C, so now x$C is a data.frame.

R allows data.frame to be plain vectors as well as matrices and
data.frames, basically anything as long as it has the correct length
or nrow.

When the data.frame is formatted for printing, each column C is
formatted then column-bound into another data.frame using
as.data.frame.list, so it takes the name A because that's the name of
the column from y.

I think what you meant to do is x$C <- y[[1]]  ## double brackets
instead of single

On Thu, Oct 26, 2023 at 4:14 AM Christian Asseburg  wrote:
>
> Hi! I came across this unexpected behaviour in R. First I thought it was a 
> bug in the assignment operator <- but now I think it's maybe a bug in the way 
> data frames are being printed. What do you think?
>
> Using R 4.3.1:
>
> > x <- data.frame(A = 1, B = 2, C = 3)
> > y <- data.frame(A = 1)
> > x
>   A B C
> 1 1 2 3
> > x$B <- y$A # works as expected
> > x
>   A B C
> 1 1 1 3
> > x$C <- y[1] # makes C disappear
> > x
>   A B A
> 1 1 1 1
> > str(x)
> 'data.frame':   1 obs. of  3 variables:
>  $ A: num 1
>  $ B: num 1
>  $ C:'data.frame':  1 obs. of  1 variable:
>   ..$ A: num 1
>
> Why does the print(x) not show "C" as the name of the third element? I did 
> mess up the data frame (and this was a mistake on my part), but finding the 
> bug was harder because print(x) didn't show the C any longer.
>
> Thanks. With best wishes -
>
> . . . Christian
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bug in print for data frames?

2023-10-26 Thread Christian Asseburg
Hi! I came across this unexpected behaviour in R. First I thought it was a bug 
in the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:

> x <- data.frame(A = 1, B = 2, C = 3)
> y <- data.frame(A = 1)
> x
  A B C
1 1 2 3
> x$B <- y$A # works as expected
> x
  A B C
1 1 1 3
> x$C <- y[1] # makes C disappear
> x
  A B A
1 1 1 1
> str(x)
'data.frame':   1 obs. of  3 variables:
 $ A: num 1
 $ B: num 1
 $ C:'data.frame':  1 obs. of  1 variable:
  ..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was 
harder because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.