Re: [R] Confused by error message: Error in assign(".popath", popath, .BaseNamespaceEnv)
В Tue, 12 Apr 2022 15:59:35 +1200 Tiffany Vidal пишет: > devtools::install_github("MikkoVihtakari/ggOceanMapsData") > > Error in assign(".popath", popath, .BaseNamespaceEnv) : > cannot change value of locked binding for '.popath' > Calls: local ... eval.parent -> eval -> eval -> eval -> eval -> assign A full output of traceback() just after the error could be very useful here. This error message might indicate a bug in devtools or its dependencies (perhaps remotes). I don't know if the developers of remotes or devtools lurk here, but it should be possible to reach them at GitHub: https://github.com/r-lib/devtools/issues You could also check whether devtools or the package itself is to blame by downloading the code (either using git clone or by downloading the zip archive of the head of the master branch), then running R CMD build and R CMD INSTALL on the contents. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
aggregate(), tapply(), do.call(), rbind() (etc.) are extremely useful functions that have been available in R for a long time. They remain useful regardless what plotting approach you use - base graphics, lattice or the more recent ggplot. Philip On 22/02/2017 8:40 AM, C W wrote: Hi Carl, I have not fully learned dplyr, but it seems harder than tapply() and the ?apply() family in general. Almost every ggplot2 data I have seen is manipulated using dplyr. Something must be good about dplyr. aggregate(), tapply(), do.call(), rbind() will be sorely missed! :( Thanks! On Tue, Feb 21, 2017 at 4:21 PM, Carl Suttonwrote: Hi I have found that: A) Hadley's new book to be wonderful on how to use dplyr, ggplot2 and his other packages. Read this and using as a reference saves major frustration. b) Data Camps courses on ggplot2 are also wonderful. GGPLOT2 has more capability than I have mastered or needed. To be an expert with ggplot2 will take some effort. To just get run of the mill helpful, beautiful plots, no major time needed for that. I use both of these sources regularly, especially when what is in my grey matter memory banks is not working. Refreshers are sometimes needed. If your data sets are large and available memory limited, then data.table is the package I use. I am amazed at the difference of memory usage with data.table versus other packages. My laptop has 16gb ram, and tidyr maxed it but data.table melt used less than 6gb(if I remember correctly) on my current work. Since discovering fread and fwrite, read.table, read.csv, and write have been benched. Every script I have includes library(data.table) Carl Sutton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
Hi Carl, I have not fully learned dplyr, but it seems harder than tapply() and the ?apply() family in general. Almost every ggplot2 data I have seen is manipulated using dplyr. Something must be good about dplyr. aggregate(), tapply(), do.call(), rbind() will be sorely missed! :( Thanks! On Tue, Feb 21, 2017 at 4:21 PM, Carl Suttonwrote: > Hi > > I have found that: > A) Hadley's new book to be wonderful on how to use dplyr, ggplot2 and his > other packages. Read this and using as a reference saves major frustration. > b) Data Camps courses on ggplot2 are also wonderful. GGPLOT2 has more > capability than I have mastered or needed. To be an expert with ggplot2 > will take some effort. To just get run of the mill helpful, beautiful > plots, no major time needed for that. > > I use both of these sources regularly, especially when what is in my grey > matter memory banks is not working. Refreshers are sometimes needed. > > If your data sets are large and available memory limited, then data.table > is the package I use. I am amazed at the difference of memory usage with > data.table versus other packages. My laptop has 16gb ram, and tidyr maxed > it but data.table melt used less than 6gb(if I remember correctly) on my > current work. Since discovering fread and fwrite, read.table, read.csv, > and write have been benched. Every script I have includes > library(data.table) > > Carl Sutton > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about using data.table package,
Hi I have found that:A) Hadley's new book to be wonderful on how to use dplyr, ggplot2 and his other packages. Read this and using as a reference saves major frustration. b) Data Camps courses on ggplot2 are also wonderful. GGPLOT2 has more capability than I have mastered or needed. To be an expert with ggplot2 will take some effort. To just get run of the mill helpful, beautiful plots, no major time needed for that. I use both of these sources regularly, especially when what is in my grey matter memory banks is not working. Refreshers are sometimes needed. If your data sets are large and available memory limited, then data.table is the package I use. I am amazed at the difference of memory usage with data.table versus other packages. My laptop has 16gb ram, and tidyr maxed it but data.table melt used less than 6gb(if I remember correctly) on my current work. Since discovering fread and fwrite, read.table, read.csv, and write have been benched. Every script I have includes library(data.table) Carl Sutton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
Just. Don't. Do. This. (Hint: Threading mail readers.) On 21 Feb 2017, at 03:53 , C Wwrote: > Thanks Hadley! > > While I got your attention, what is a good way to get started on ggplot2? ;) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
I suspect Hadley would recommend reading his new book, R for Data Science (r4ds.had.co.nz), in particular Chapter 3. You don't need plyr, but it won't take long before you will want to be using dplyr and tidyr, which are covered in later chapters. -- Sent from my phone. Please excuse my brevity. On February 20, 2017 6:53:29 PM PST, C Wwrote: >Thanks Hadley! > >While I got your attention, what is a good way to get started on >ggplot2? ;) > >My impression is that I first need to learn plyr, dplyr, AND THEN >ggplot2. >That's A LOT! > >Suppose i have this: >iris >iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE)) >iris2 > >I want to have some kind of graph conditioned on species, by grade . >What's >a good lead to learn about plotting this? > >Thank you! > > > >On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickham >wrote: > >> On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius > >> wrote: >> > >> >> On Feb 19, 2017, at 11:37 AM, C W wrote: >> >> >> >> Hi R, >> >> >> >> I am a little confused by the data.table package. >> >> >> >> library(data.table) >> >> >> >> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), >y=rnorm(20, >> 10, 1), >> >> z=rnorm(20, 20, 1)) >> >> >> >> df <- data.table(df) >> > >> > df <- setDT(df) is preferred. >> >> Don't you mean just >> >> setDT(df) >> >> ? >> >> setDT() modifies by reference. >> >> >> >> >> df_3 <- df[, a := x-y] # created new column a using x minus y, why >are >> we >> >> using colon equals? >> > >> > You need to do more study of the extensive documentation. The >behavior >> of the ":=" function is discussed in detail there. >> >> You can get to that documentation with ?":=" >> >> Hadley >> >> -- >> http://hadley.nz >> > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
Thanks Hadley! While I got your attention, what is a good way to get started on ggplot2? ;) My impression is that I first need to learn plyr, dplyr, AND THEN ggplot2. That's A LOT! Suppose i have this: iris iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE)) iris2 I want to have some kind of graph conditioned on species, by grade . What's a good lead to learn about plotting this? Thank you! On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickhamwrote: > On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius > wrote: > > > >> On Feb 19, 2017, at 11:37 AM, C W wrote: > >> > >> Hi R, > >> > >> I am a little confused by the data.table package. > >> > >> library(data.table) > >> > >> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, > 10, 1), > >> z=rnorm(20, 20, 1)) > >> > >> df <- data.table(df) > > > > df <- setDT(df) is preferred. > > Don't you mean just > > setDT(df) > > ? > > setDT() modifies by reference. > > >> > >> df_3 <- df[, a := x-y] # created new column a using x minus y, why are > we > >> using colon equals? > > > > You need to do more study of the extensive documentation. The behavior > of the ":=" function is discussed in detail there. > > You can get to that documentation with ?":=" > > Hadley > > -- > http://hadley.nz > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
> On Feb 20, 2017, at 8:12 AM, Hadley Wickhamwrote: > > On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius > wrote: >> >>> On Feb 19, 2017, at 11:37 AM, C W wrote: >>> >>> Hi R, >>> >>> I am a little confused by the data.table package. >>> >>> library(data.table) >>> >>> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, >>> 1), >>> z=rnorm(20, 20, 1)) >>> >>> df <- data.table(df) >> >> df <- setDT(df) is preferred. > > Don't you mean just > > setDT(df) > > ? > > setDT() modifies by reference. Thanks for the correction. > >>> >>> df_3 <- df[, a := x-y] # created new column a using x minus y, why are we >>> using colon equals? >> >> You need to do more study of the extensive documentation. The behavior of >> the ":=" function is discussed in detail there. > > You can get to that documentation with ?":=" That's a good place to start reading, but I was thinking of data.table::datatable-faq, data.table::datatable-intro which are on the Vignettes page from: help(pac=data.table). > > Hadley > > -- > http://hadley.nz David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
On Sun, Feb 19, 2017 at 3:01 PM, David Winsemiuswrote: > >> On Feb 19, 2017, at 11:37 AM, C W wrote: >> >> Hi R, >> >> I am a little confused by the data.table package. >> >> library(data.table) >> >> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), >> z=rnorm(20, 20, 1)) >> >> df <- data.table(df) > > df <- setDT(df) is preferred. Don't you mean just setDT(df) ? setDT() modifies by reference. >> >> df_3 <- df[, a := x-y] # created new column a using x minus y, why are we >> using colon equals? > > You need to do more study of the extensive documentation. The behavior of the > ":=" function is discussed in detail there. You can get to that documentation with ?":=" Hadley -- http://hadley.nz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about using data.table package,
> On Feb 19, 2017, at 11:37 AM, C Wwrote: > > Hi R, > > I am a little confused by the data.table package. > > library(data.table) > > df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), > z=rnorm(20, 20, 1)) > > df <- data.table(df) df <- setDT(df) is preferred. > > #drop column w > > df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w] Nope. The "[.data.table" function is very different from the "[.data.frame' function. As you should be able to see, an expression in the `j` position for "[.data.table" gets evaluated in the environment of the data.table object, so unquoted column names get returned after application of any function. Here it's just a unary minus. Actually "nope" on two accounts. You cannot use a unary minus for column names in `[.data.frame` either. Would have needed to be df[ , !colnames(df) in "w"] # logical indexing > > df_2 <- df[x > df_3 <- df[, a := x-y] # created new column a using x minus y, why are we > using colon equals? You need to do more study of the extensive documentation. The behavior of the ":=" function is discussed in detail there. > > I am a bit confused by this syntax. It's non-standard for R but many people find the efficiencies of the package worth the extra effort to learn what is essentially a different evaluation strategy. > > Thanks! > > [[alternative HTML version deleted]] Rhelp is a plain text mailing list, -- David > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about using data.table package,
Hi R, I am a little confused by the data.table package. library(data.table) df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), z=rnorm(20, 20, 1)) df <- data.table(df) #drop column w df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w] df_2 <- df[x
[R] Confused by dlnorm - densities do not match histogram
Good evening! I'm running into some surprising behavior with dlnorm() and trying to understand it. To set the stage, I'll plot the density and overlay a normal distribution. This works exactly as expected; the two graphs align quite closely: qplot(data=data.frame(x=rnorm(1e5,4,2)),x=x,stat='density',geom='area') + stat_function(fun=dnorm,args=list(4,2),colour='blue') but then I change to a log normal distribution and the behaviour gets odd. The distribution looks nothing like the density plot: qplot(data=data.frame(x=rlnorm(1e5,4,2)),x=x,log='x',stat='density',geom='area') + stat_function(fun=dlnorm,args=list(4,2),colour='blue') I thought the issue might be scale transformation - if dlnorm is giving the density per unit x this is not the same as the density after transforming to log(x). So I tried to effect this scale transformation manually by dividing by the derivative of log(x) - i.e. by multiplying by x - but this also did not match: qplot(data=data.frame(x=rlnorm(1e5,4,2)),x=x,log='x',stat='density',geom='area') + stat_function(fun=function(x,...){dlnorm(x,...)*x},args=list(4,2),colour='blue') I also tried plotting without the log scale to eliminate that transformation as a source of discrepancy, and they still don't match: qplot(data=data.frame(x=rlnorm(1e5,4,2)),x=x,stat='density',geom='area',xlim=c(0,50)) + stat_function(fun=dlnorm,args=list(4,2),colour='blue') I'd appreciate any help in understanding what I'm missing. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused by code?
Hello, It is pretty basic, and it is deceptively simple. The worst of all :) When you index a matrix 'x' by another matrix 'z' the index can be a logical matrix of the same dimensions or recyclable to the dims of 'x', it can be a matrix with only two columns, a row numbers column and a column numbers one, or your case. In your case, 'z' is coerced to vector, and the values in 'z' are taken to be indexes to 'x'. But since you only have two distinct values and one of them is zero, it will only return x[1] three times (there are three 1s in 'z'). The same goes for 'y'. Correct: # Create an index matrix z.inx - which(z == 1, arr.ind = TRUE) z.inx # Test x1 - x2 - x3 - x # Use copies to test x1[z == 1] - y[z == 1] x2[z.inx] - y[z.inx] # 1 and 0 to T/F x3[as.logical(z)] - y[as.logical(z)] x1 identical(x1, x2) identical(x1, x3) Hope this helps, Rui Barradas Em 23-09-2012 21:52, Bazman76 escreveu: x-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) y-matrix(c(0,0,0,1,0,0,1,1,0),nrow=3) z-matrix(c(0,1,0,0,1,0,1,0,0),nrow=3) x[z]-y[z] The resultant matrix x is all zeros except for the last two diagonal cells which are 1's. While y is lower triangualr 0's with the remaining cells all ones. I really don't understand how this deceptively simple looking piece of code is giving that result can someone explain please. I'm obviously missing something pretty basic so please keep your answer suitably basic. -- View this message in context: http://r.789695.n4.nabble.com/Confused-by-code-tp4643946.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused by code?
Thanks Rui Barrudas and Peter Alspach, I understand better now: x-matrix(c(1,0,0,0,2,0,0,0,2),nrow=3) y-matrix(c(7,8,9,1,5,10,1,1,0),nrow=3) z-matrix(c(0,1,0,0,0,0,6,0,0),nrow=3) x[z]-y[z] viewData(x) produces an x matrix 7 0 0 0 2 0 0 10 2 which makes sense the first element of y 7 is inserted into z in slot x[1] and the and 6th element of y 10 is slotted into the x[6]. However the original code runs like this: mI- mRU(de.d, de.nP)de.CR mPV[mI]mP[mI] where mPv and MP are both (de.d, de.nP) matrices. and mRUlt;-function(m,n){ return(array(runif(m*n), dim=c(m,n))) } i.e. it returns an array of m*n random numbers uniformly distributed between 0 and 1. de.CR is a fixed value say 0.8. So mIlt;- mRU(de.d, de.NP)de.CR returns a de.d*de.nP array where each element is 1 is its more than 0.8 and zero otherwise. So in this case element mPv[1] will be repeatedly filled with the value of mP[1] and all other elements will remain unaffected? Is this correct? If so I am still confused as this is not what I thought was supposed to by happening but I know that the code overall does its job correctly? -- View this message in context: http://r.789695.n4.nabble.com/Confused-by-code-tp4643946p4644010.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused by code?
Hello, Inline. Em 24-09-2012 15:31, Bazman76 escreveu: Thanks Rui Barrudas and Peter Alspach, I understand better now: x-matrix(c(1,0,0,0,2,0,0,0,2),nrow=3) y-matrix(c(7,8,9,1,5,10,1,1,0),nrow=3) z-matrix(c(0,1,0,0,0,0,6,0,0),nrow=3) x[z]-y[z] viewData(x) produces an x matrix 7 0 0 0 2 0 0 10 2 which makes sense the first element of y 7 is inserted into z in slot x[1] and the and 6th element of y 10 is slotted into the x[6]. However the original code runs like this: mI- mRU(de.d, de.nP)de.CR mPV[mI]mP[mI] where mPv and MP are both (de.d, de.nP) matrices. and mRUlt;-function(m,n){ return(array(runif(m*n), dim=c(m,n))) } i.e. it returns an array of m*n random numbers uniformly distributed between 0 and 1. de.CR is a fixed value say 0.8. So mIlt;- mRU(de.d, de.NP)de.CR returns a de.d*de.nP array where each element is 1 is its more than 0.8 and zero otherwise. So in this case element mPv[1] will be repeatedly filled with the value of mP[1] and all other elements will remain unaffected? Is this correct? Yes and no, it should return a logical matrix, not a numeric one. Since it seems to be returning numbers 0/1, you can use as.logical like I've shown in my first post, or, maybe better, mI- which(mRU(de.d, de.nP) de.CR, arr.ind = TRUE) Like this you'll have an index matrix, whose purpose is precisely what its names says, to index. Matrices. (I'm also a bit confused as to why the logical condition is returning numbers, are you sure of that?) Anyway, the right way would be to index 'mPV' using a logical or an index matrix. Hope this helps, Rui Barradas If so I am still confused as this is not what I thought was supposed to by happening but I know that the code overall does its job correctly? -- View this message in context: http://r.789695.n4.nabble.com/Confused-by-code-tp4643946p4644010.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused by code?
I've just reread my answer and it's not very clear. Not at all. Inline. Em 24-09-2012 18:34, Rui Barradas escreveu: Hello, Inline. Em 24-09-2012 15:31, Bazman76 escreveu: Thanks Rui Barrudas and Peter Alspach, I understand better now: x-matrix(c(1,0,0,0,2,0,0,0,2),nrow=3) y-matrix(c(7,8,9,1,5,10,1,1,0),nrow=3) z-matrix(c(0,1,0,0,0,0,6,0,0),nrow=3) x[z]-y[z] viewData(x) produces an x matrix 7 0 0 0 2 0 0 10 2 which makes sense the first element of y 7 is inserted into z in slot x[1] and the and 6th element of y 10 is slotted into the x[6]. However the original code runs like this: mI- mRU(de.d, de.nP)de.CR mPV[mI]mP[mI] where mPv and MP are both (de.d, de.nP) matrices. and mRUlt;-function(m,n){ return(array(runif(m*n), dim=c(m,n))) } i.e. it returns an array of m*n random numbers uniformly distributed between 0 and 1. de.CR is a fixed value say 0.8. So mIlt;- mRU(de.d, de.NP)de.CR returns a de.d*de.nP array where each element is 1 is its more than 0.8 and zero otherwise. So in this case element mPv[1] will be repeatedly filled with the value of mP[1] and all other elements will remain unaffected? Is this correct? Yes and no, Yes, it is absolutely correct. As is, the matrix mI is coerced to vector first and then, since it only has values 0 and 1 the element mPv[1] will be repeatedly filled with thesame value of mP[1]. The rest of my answer is right, though. But 'it', the very first word in my post after this comment, refers to what? To the condition that creates the index matrix ml but this is not at all as clear as it should. it should return a logical matrix, not a numeric one. Since it seems to be returning numbers 0/1, you can use as.logical like I've shown in my first post, or, maybe better, mI- which(mRU(de.d, de.nP) de.CR, arr.ind = TRUE) Use this suggestion. It can't go wrong. Rui Barradas Like this you'll have an index matrix, whose purpose is precisely what its names says, to index. Matrices. (I'm also a bit confused as to why the logical condition is returning numbers, are you sure of that?) Anyway, the right way would be to index 'mPV' using a logical or an index matrix. Hope this helps, Rui Barradas If so I am still confused as this is not what I thought was supposed to by happening but I know that the code overall does its job correctly? -- View this message in context: http://r.789695.n4.nabble.com/Confused-by-code-tp4643946p4644010.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused by code?
x-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) y-matrix(c(0,0,0,1,0,0,1,1,0),nrow=3) z-matrix(c(0,1,0,0,1,0,1,0,0),nrow=3) x[z]-y[z] The resultant matrix x is all zeros except for the last two diagonal cells which are 1's. While y is lower triangualr 0's with the remaining cells all ones. I really don't understand how this deceptively simple looking piece of code is giving that result can someone explain please. I'm obviously missing something pretty basic so please keep your answer suitably basic. -- View this message in context: http://r.789695.n4.nabble.com/Confused-by-code-tp4643946.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused by code?
Tena koe I think you probably meant: x[as.logical(z)] - y[as.logical(z)] i.e., choosing those elements of × and y where z is 1 (TRUE as logical). Whereas what you have written: ×[z] - y[z] references the 0th (by default indexing starts at 1 so this is empty (see ×[0]) and the first element of × and y (repeatedly). Hope this helps Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bazman76 Sent: Monday, 24 September 2012 8:53 a.m. To: r-help@r-project.org Subject: [R] Confused by code? x-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) y-matrix(c(0,0,0,1,0,0,1,1,0),nrow=3) z-matrix(c(0,1,0,0,1,0,1,0,0),nrow=3) x[z]-y[z] The resultant matrix x is all zeros except for the last two diagonal cells which are 1's. While y is lower triangualr 0's with the remaining cells all ones. I really don't understand how this deceptively simple looking piece of code is giving that result can someone explain please. I'm obviously missing something pretty basic so please keep your answer suitably basic. -- View this message in context: http://r.789695.n4.nabble.com/Confused-by-code-tp4643946.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about multiple imputation with rms or Hmisc packages
Hello, I'm working on a Cox Proportional Hazards model for a cancer data set that has missing values for the categorical variable Grade in less than 10% of the observations. I'm not a statistician, but based on my readings of Frank Harrell's book it seems to be a candidate for using multiple imputation technique(s). I understand the concepts behind imputation, but using the functions in rms and Hmisc is confounding me. For instance, whether to use transcan or aregImpute. Here is a sample of my data: https://dl.dropbox.com/u/1852742/sample.csv Drawing from Chapter 8 of Harrell's book, this is what I've been toying with: #recurfree_survival_fromsx is survival time, rf_obs_sx codes for events as a binary variable. #The CPH model I would like to fit, using Ograde_dx as the variable for overall grade at #diagnosis, ord_nodes as an ordinal variable for the # lymph nodes involved. obj=with(mydata, Surv(recurfree_survival_fromsx,rf_obs_sx)) mod=cph(obj~ord_nodes+Ograde_dx+ERorPR+HER2_Sum,data=mydata,x=T,y=T) #Impute missing data mydata.transcan=transcan(~Ograde_dx+tumorsize+ord_nodes+simp_stage_path+afam+ Menopause+Age,imputed=T,n.impute=10) summary(mydata.transcan) The issues I have are: a) In your opinion(s), should I even be imputing this data? Is it appropriate here? b) Even after reading the help pages and Harrell's book, I'm not sure I used the correct imputation method, and whether I should be using transcan or aregImpute. c) In the output of summary(transcan), is R-squared the best value to describe how reliably the function could predict Ograde_dx? What is an acceptable level? d) Do I use the function fit.mult.impute to fit my final cph model? I appreciate your help with this as it is a somewhat confusing topic. I hope I gave you all the information you need to answer my questions. Sincerely, Jahan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confused with indexing
Dear all, I have a code that looks like the following (I am sorry that this is not a reproducible example) indexSkipped-NULL code Skipped that might alter indexSkipped if (length(indexSkipped)==0) spatial_structure-spatial_structures_from_measurements(DataList[[i]]$Lon,DataList[[i]]$Lat,meanVector) else spatial_structure-spatial_structures_from_measurements(DataList[[i]]$Lon[-indexSkipped],DataList[[i]]$Lat[-indexSkipped],meanVector) # What I am doing here is that I am processing files. Every files has a measurement table and Longtitude and Latitude fields. If one file is marked as invalid I keep a number of of the skipped index so to remove the element of the Longtitude and Latitide vectors. 1) That works correct, I was just wondering if it would be possible to remove some how the given if statement and initialize the indexSkipped in such a way so the DataList[[i]]$Lon[-indexSkipped],DataList[[i]]$Lat[-indexSkipped] do nothing, aka remove no element, in case the indexSkipped remains unchanged (in its initial value). 2) When u define a variable as empty, I usually use NULL, how I can check afterwords if that holds or not. If I use the (indexSkipped==NULL) logical(0) this does not return true or false. How I can do that check? Iwould like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused with indexing
use is.null for the test if (is.null(indexSkipped)) Sent from my iPad On May 22, 2012, at 2:10, Alaios ala...@yahoo.com wrote: Dear all, I have a code that looks like the following (I am sorry that this is not a reproducible example) indexSkipped-NULL code Skipped that might alter indexSkipped if (length(indexSkipped)==0) spatial_structure-spatial_structures_from_measurements(DataList[[i]]$Lon,DataList[[i]]$Lat,meanVector) else spatial_structure-spatial_structures_from_measurements(DataList[[i]]$Lon[-indexSkipped],DataList[[i]]$Lat[-indexSkipped],meanVector) # What I am doing here is that I am processing files. Every files has a measurement table and Longtitude and Latitude fields. If one file is marked as invalid I keep a number of of the skipped index so to remove the element of the Longtitude and Latitide vectors. 1) That works correct, I was just wondering if it would be possible to remove some how the given if statement and initialize the indexSkipped in such a way so the DataList[[i]]$Lon[-indexSkipped],DataList[[i]]$Lat[-indexSkipped] do nothing, aka remove no element, in case the indexSkipped remains unchanged (in its initial value). 2) When u define a variable as empty, I usually use NULL, how I can check afterwords if that holds or not. If I use the (indexSkipped==NULL) logical(0) this does not return true or false. How I can do that check? Iwould like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused: Inconsistent result?
This is copy paste from my session: xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class dim(xyz)-c(length(xyz)/2,2) allobj-function(){ + xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } xyz [,1] [,2] [1,] a character [2,] aa character [3,] abc character [4,] AirPassengers character [5,] allobj character [6,] allObjects character [7,] allObjects2 character [8,] arrayFromAPL character [9,] classes character [10,] myCharVector character [11,] myDateVector character [12,] myNumericVector character [13,] newArrayFromAPL character [14,] obj character [15,] objClass character [16,] x character [17,] xyz character [18,] y character allobj() [,1] [,2] As far as I can see, the function allobj has the same expressions as those executed from the command line. Why are the results different? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Inconsistent result?
On Feb 20, 2012, at 10:07 AM, Ajay Askoolum wrote: This is copy paste from my session: xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class dim(xyz)-c(length(xyz)/2,2) allobj-function(){ + xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } xyz [,1] [,2] [1,] a character [2,] aa character [3,] abc character [4,] AirPassengers character [5,] allobj character [6,] allObjects character [7,] allObjects2 character [8,] arrayFromAPLcharacter [9,] classes character [10,] myCharVectorcharacter [11,] myDateVectorcharacter [12,] myNumericVector character [13,] newArrayFromAPL character [14,] obj character [15,] objClasscharacter [16,] x character [17,] xyz character [18,] y character allobj() [,1] [,2] As far as I can see, the function allobj has the same expressions as those executed from the command line. Why are the results different? The ls function looks only in the local environment if not supplied with specific directions about where to look. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Inconsistent result?
Sorry, just checked it and you need to add .GlobalEnv to both ls() calls. Michael On Mon, Feb 20, 2012 at 10:17 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Short answer, environments -- ls() looks (by default) in its current environment, which is not the same as the global environment when being called inside a function. This would (I think) give the same answer but I haven't checked it. : allobj-function(){ + xyz-as.vector(c(ls(.GlobalEnv),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } On Mon, Feb 20, 2012 at 10:07 AM, Ajay Askoolum aa2e...@yahoo.co.uk wrote: This is copy paste from my session: xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class dim(xyz)-c(length(xyz)/2,2) allobj-function(){ + xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } xyz [,1] [,2] [1,] a character [2,] aa character [3,] abc character [4,] AirPassengers character [5,] allobj character [6,] allObjects character [7,] allObjects2 character [8,] arrayFromAPL character [9,] classes character [10,] myCharVector character [11,] myDateVector character [12,] myNumericVector character [13,] newArrayFromAPL character [14,] obj character [15,] objClass character [16,] x character [17,] xyz character [18,] y character allobj() [,1] [,2] As far as I can see, the function allobj has the same expressions as those executed from the command line. Why are the results different? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Inconsistent result?
Short answer, environments -- ls() looks (by default) in its current environment, which is not the same as the global environment when being called inside a function. This would (I think) give the same answer but I haven't checked it. : allobj-function(){ + xyz-as.vector(c(ls(.GlobalEnv),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } On Mon, Feb 20, 2012 at 10:07 AM, Ajay Askoolum aa2e...@yahoo.co.uk wrote: This is copy paste from my session: xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class dim(xyz)-c(length(xyz)/2,2) allobj-function(){ + xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } xyz [,1] [,2] [1,] a character [2,] aa character [3,] abc character [4,] AirPassengers character [5,] allobj character [6,] allObjects character [7,] allObjects2 character [8,] arrayFromAPL character [9,] classes character [10,] myCharVector character [11,] myDateVector character [12,] myNumericVector character [13,] newArrayFromAPL character [14,] obj character [15,] objClass character [16,] x character [17,] xyz character [18,] y character allobj() [,1] [,2] As far as I can see, the function allobj has the same expressions as those executed from the command line. Why are the results different? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Inconsistent result?
Hi This is copy paste from my session: xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class dim(xyz)-c(length(xyz)/2,2) allobj-function(){ + xyz-as.vector(c(ls(),as.matrix(lapply(ls(),class; + dim(xyz)-c(length(xyz)/2,2); + return(xyz) + } xyz [,1] [,2] [1,] a character [2,] aa character [3,] abc character [4,] AirPassengers character [5,] allobj character [6,] allObjects character [7,] allObjects2 character [8,] arrayFromAPLcharacter [9,] classes character [10,] myCharVectorcharacter [11,] myDateVectorcharacter [12,] myNumericVector character [13,] newArrayFromAPL character [14,] obj character [15,] objClasscharacter [16,] x character [17,] xyz character [18,] y character allobj() [,1] [,2] As far as I can see, the function allobj has the same expressions as those executed from the command line. Why are the results different? Probably due to environment handling. Do you really want to check if ls behaves as is intended and that it produces character vector? Or your intention is a little bit more ambitious and you want to know what objects do you have? If the later, I recommend to use this function: function (pos = 1, pattern, order.by) { napply - function(names, fn) sapply(names, function(x) fn(get(x, pos = pos))) names - ls(pos = pos, pattern = pattern) obj.class - napply(names, function(x) as.character(class(x))[1]) obj.mode - napply(names, mode) obj.type - ifelse(is.na(obj.class), obj.mode, obj.class) obj.size - napply(names, object.size) obj.dim - t(napply(names, function(x) as.numeric(dim(x))[1:2])) vec - is.na(obj.dim)[, 1] (obj.type != function) obj.dim[vec, 1] - napply(names, length)[vec] out - data.frame(obj.type, obj.size, obj.dim) names(out) - c(Type, Size, Rows, Columns) if (!missing(order.by)) out - out[order(out[[order.by]]), ] out } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused with Student's sleep data description
I am confused whether Student's sleep data show the effect of two soporific drugs or Control against Treatment (one drug). The reason is the next: require(stats) data(sleep) attach(sleep) extra[group==1] numeric(0) group [1] Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Trt Trt Trt Trt Trt Trt Trt Trt Trt [20] Trt Levels: Ctl Trt sleep$group [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 Does some package overwrite my attach()? I am worried mostly in the right performance of my code by others. So have the attach() to be avoided? Thanks for answers! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused with Student's sleep data description
It doesn't have anything to do with attach (which is naughty in other ways!) rather it's the internal representation of categorical variables (R speak: factors) that store each level as an integer for memory efficiency but print things with string levels so they look nice to the user. You'll note there's a 1-to-1 match between Ctl-1 an Trt-2 in your data. The funny business (best I reckon) is that use of $ which down-grades your data to its internal representation as a numeric (integer) vector. But yes, you should avoid attach anyways. M On Jan 27, 2012, at 6:03 AM, Олег Девіняк o.devin...@gmail.com wrote: I am confused whether Student's sleep data show the effect of two soporific drugs or Control against Treatment (one drug). The reason is the next: require(stats) data(sleep) attach(sleep) extra[group==1] numeric(0) group [1] Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Trt Trt Trt Trt Trt Trt Trt Trt Trt [20] Trt Levels: Ctl Trt sleep$group [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 Does some package overwrite my attach()? I am worried mostly in the right performance of my code by others. So have the attach() to be avoided? Thanks for answers! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused with Student's sleep data description
On Jan 27, 2012, at 17:18 , R. Michael Weylandt wrote: It doesn't have anything to do with attach (which is naughty in other ways!) rather it's the internal representation of categorical variables (R speak: factors) that store each level as an integer for memory efficiency but print things with string levels so they look nice to the user. You'll note there's a 1-to-1 match between Ctl-1 an Trt-2 in your data. The funny business (best I reckon) is that use of $ which down-grades your data to its internal representation as a numeric (integer) vector. Rubbish! There must be more to this: data(sleep) sleep$group [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 attach(sleep) group [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 Presumably there's a group variable with different factor levels sitting in the global environment. $ certainly will not down-grade data to integers (much less keep them as factors but modify the level set) -pd But yes, you should avoid attach anyways. M On Jan 27, 2012, at 6:03 AM, Олег Девіняк o.devin...@gmail.com wrote: I am confused whether Student's sleep data show the effect of two soporific drugs or Control against Treatment (one drug). The reason is the next: require(stats) data(sleep) attach(sleep) extra[group==1] numeric(0) group [1] Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Ctl Trt Trt Trt Trt Trt Trt Trt Trt Trt [20] Trt Levels: Ctl Trt sleep$group [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 Does some package overwrite my attach()? I am worried mostly in the right performance of my code by others. So have the attach() to be avoided? Thanks for answers! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused with an error message related to plotrix library in the newer versions of R.
On 11/14/2011 05:59 PM, Prasanth V P wrote: require(plotrix) xy.pop- c(17,15,13,11,9,8,6,5,4,3,2,2,1,3) xx.pop- c(17,14,12,11,11,8,6,5,4,3,2,2,2,3) agelabels- c(0-4,5-9,10-14,15-19,20-24,25-29,30-34, 35-39,40-44,45-49,50-54,55-59,60-64,65+) xycol-color.gradient(c(0,0,0.5,0.15),c(0.25,0.5,0.5,1.75),c(0.5,1.5,1,0),18) xxcol-color.gradient(c(0,1,0.5,1),c(0.25,0.5,0.5,1.25),c(0.5,0.25,0.5,1.5),18) par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels, labelcex=1.125, main=Population Pyramid -- Malawi, xycol=xycol, xxcol=xxcol)) Hi Prasanth V P, Just a typo. Try this: par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels,labelcex=1.125, main=Population Pyramid -- Malawi, lxcol=xycol,rxcol=xxcol)) Nice plot. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused with an error message related to plotrix library in the newer versions of R.
Hi Jim, It's working perfectly fine with the rxcol parameter. I am just wondering how could I miss that..!!! By the way, many thanks for pointing it out... Otherwise, I would have been using the old version of R for just getting the required plot. Much Appreciated, Prasanth. -Original Message- From: Jim Lemon [mailto:j...@bitwrit.com.au] Sent: 14 November 2011 13:39 To: Prasanth V P Cc: r-help@r-project.org Subject: Re: [R] Confused with an error message related to plotrix library in the newer versions of R. On 11/14/2011 05:59 PM, Prasanth V P wrote: require(plotrix) xy.pop- c(17,15,13,11,9,8,6,5,4,3,2,2,1,3) xx.pop- c(17,14,12,11,11,8,6,5,4,3,2,2,2,3) agelabels- c(0-4,5-9,10-14,15-19,20-24,25-29,30-34, 35-39,40-44,45-49,50-54,55-59,60-64,65+) xycol-color.gradient(c(0,0,0.5,0.15),c(0.25,0.5,0.5,1.75),c(0.5,1.5,1,0), 18) xxcol-color.gradient(c(0,1,0.5,1),c(0.25,0.5,0.5,1.25),c(0.5,0.25,0.5,1.5 ),18) par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels, labelcex=1.125, main=Population Pyramid -- Malawi, xycol=xycol, xxcol=xxcol)) Hi Prasanth V P, Just a typo. Try this: par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels,labelcex=1.125, main=Population Pyramid -- Malawi, lxcol=xycol,rxcol=xxcol)) Nice plot. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused with an error message related to plotrix library in the newer versions of R.
Dear R Users, Greetings! I am confused with an error message related to plotrix library in the newer versions of R. I used to run an R script without fail in the earlier versions (R 2.8.1) of R; but the same script is now throwing up an error message in the newer versions (Now I have R 2.13.0 R 2.14.0). Herewith I am furnishing the same code for your perusal. It would have been better if somebody could look into this matter and explain in detail. require(plotrix) xy.pop - c(17,15,13,11,9,8,6,5,4,3,2,2,1,3) xx.pop - c(17,14,12,11,11,8,6,5,4,3,2,2,2,3) agelabels - c(0-4,5-9,10-14,15-19,20-24,25-29,30-34, 35-39,40-44,45-49,50-54,55-59,60-64,65+) xycol-color.gradient(c(0,0,0.5,0.15),c(0.25,0.5,0.5,1.75),c(0.5,1.5,1,0),18) xxcol-color.gradient(c(0,1,0.5,1),c(0.25,0.5,0.5,1.25),c(0.5,0.25,0.5,1.5),18) par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels, labelcex=1.125, main=Population Pyramid -- Malawi, xycol=xycol, xxcol=xxcol)) Much Appreciated, *Prasanth, V.P.* Global Manager Biometrics Delta Technology Management Services Pvt Ltd, Plot No: 13/2, Sector - I, Third Floor, HUDA Techno Enclave, Madhapur, Hyderabad - 500 081. ( : +91-40-3028 2113 È: +91-9848 290025 * : vprasa...@deltaintech.com ** The information contained in this email is confidential and may contain proprietary information. It is meant solely for the intended recipient. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted in reliance on this, is prohibited and may be unlawful. No liability or responsibility is accepted if information or data is, for whatever reason corrupted or does not reach its intended recipient. No warranty is given that this email is free of viruses. The views expressed in this email are, unless otherwise stated, those of the author and not those of DELTA Technology and Management Services pvt ltd or its management. DELTA Technology and Management Services pvt ltd reserves the right to monitor intercept and block emails addressed to its users or take any other action in accordance with its email use policy Thank you in advance for your cooperation. ** P Please don't print this e-mail unless you really need to. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about a warning message
On Jul 7, 2011, at 8:47 PM, Gang Chen wrote: I define the following function to convert a t-value with degrees of freedom DF to another t-value with different degrees of freedom fullDF: tConvert - function(tval, DF, fullDF) ifelse(DF=1, qt(pt(tval, DF), fullDF), 0) It works as expected with the following case: tConvert(c(2,3), c(10,12), 12) [1] 1.961905 3.00 However, it gives me warning for the example below although the output is still as intended: tConvert(c(2,3), c(0,12), 12) [1] 0 3 Warning message: In pt(q, df, lower.tail, log.p) : NaNs produced I'm confused about the warning especially considering the fact that the following works correctly without such warning: tConvert(2, 0, 12) [1] 0 What am I missing? The fact that ifelse evaluates both sides of the consequent and alternative. Thanks, Gang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about a warning message
I define the following function to convert a t-value with degrees of freedom DF to another t-value with different degrees of freedom fullDF: tConvert - function(tval, DF, fullDF) ifelse(DF=1, qt(pt(tval, DF), fullDF), 0) It works as expected with the following case: tConvert(c(2,3), c(10,12), 12) [1] 1.961905 3.00 However, it gives me warning for the example below although the output is still as intended: tConvert(c(2,3), c(0,12), 12) [1] 0 3 Warning message: In pt(q, df, lower.tail, log.p) : NaNs produced I'm confused about the warning especially considering the fact that the following works correctly without such warning: tConvert(2, 0, 12) [1] 0 What am I missing? Thanks, Gang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about a warning message
On Jul 7, 2011, at 8:52 PM, David Winsemius wrote: On Jul 7, 2011, at 8:47 PM, Gang Chen wrote: I define the following function to convert a t-value with degrees of freedom DF to another t-value with different degrees of freedom fullDF: tConvert - function(tval, DF, fullDF) ifelse(DF=1, qt(pt(tval, DF), fullDF), 0) It works as expected with the following case: tConvert(c(2,3), c(10,12), 12) [1] 1.961905 3.00 However, it gives me warning for the example below although the output is still as intended: tConvert(c(2,3), c(0,12), 12) [1] 0 3 Warning message: In pt(q, df, lower.tail, log.p) : NaNs produced I'm confused about the warning especially considering the fact that the following works correctly without such warning: tConvert(2, 0, 12) [1] 0 What am I missing? The fact that ifelse evaluates both sides of the consequent and alternative. I also think you should update yur R to the most recent version since a current version does not issue that warning. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about a warning message
Thanks for the help! Are you sure R version plays a role in this case? My R version is 2.13.0 Your suggestion prompted me to look into the help content of ifelse, and a similar example exists there: x - c(6:-4) sqrt(x) #- gives warning sqrt(ifelse(x = 0, x, NA)) # no warning ## Note: the following also gives the warning ! ifelse(x = 0, sqrt(x), NA) Based on the above example, now I have a solution for my situation: tConvert2 - function(tval, DF, fullDF) qt(pt(ifelse(DF=1, tval, 0), ifelse(DF=1, DF, 1)), fullDF) tConvert2(c(2,3), c(0,12), 12) [1] 0 3 However, I feel my solution is a little kludged. Any better idea? Thanks, Gang On Thu, Jul 7, 2011 at 9:04 PM, David Winsemius dwinsem...@comcast.netwrote: On Jul 7, 2011, at 8:52 PM, David Winsemius wrote: On Jul 7, 2011, at 8:47 PM, Gang Chen wrote: I define the following function to convert a t-value with degrees of freedom DF to another t-value with different degrees of freedom fullDF: tConvert - function(tval, DF, fullDF) ifelse(DF=1, qt(pt(tval, DF), fullDF), 0) It works as expected with the following case: tConvert(c(2,3), c(10,12), 12) [1] 1.961905 3.00 However, it gives me warning for the example below although the output is still as intended: tConvert(c(2,3), c(0,12), 12) [1] 0 3 Warning message: In pt(q, df, lower.tail, log.p) : NaNs produced I'm confused about the warning especially considering the fact that the following works correctly without such warning: tConvert(2, 0, 12) [1] 0 What am I missing? The fact that ifelse evaluates both sides of the consequent and alternative. I also think you should update yur R to the most recent version since a current version does not issue that warning. -- David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about a warning message
On Jul 7, 2011, at 10:17 PM, Gang Chen wrote: Thanks for the help! Are you sure R version plays a role in this case? My R version is 2.13.0 I'm not sure, but my version is 2.13.1 Your suggestion prompted me to look into the help content of ifelse, and a similar example exists there: x - c(6:-4) sqrt(x) #- gives warning sqrt(ifelse(x = 0, x, NA)) # no warning The x variable gets converted to c( 6:0, NA,NA,NA, NA) Notice the differences here: sqrt(NA) [1] NA sqrt(-1) [1] NaN Warning message: In sqrt(-1) : NaNs produced qt(.5, 0) [1] NaN Warning message: In qt(p, df, lower.tail, log.p) : NaNs produced qt(.5, NA) [1] NA ## Note: the following also gives the warning ! ifelse(x = 0, sqrt(x), NA) Based on the above example, now I have a solution for my situation: tConvert2 - function(tval, DF, fullDF) qt(pt(ifelse(DF=1, tval, 0), ifelse(DF=1, DF, 1)), fullDF) tConvert2(c(2,3), c(0,12), 12) [1] 0 3 However, I feel my solution is a little kludged. Any better idea? Thanks, Gang On Thu, Jul 7, 2011 at 9:04 PM, David Winsemius dwinsem...@comcast.net wrote: On Jul 7, 2011, at 8:52 PM, David Winsemius wrote: On Jul 7, 2011, at 8:47 PM, Gang Chen wrote: I define the following function to convert a t-value with degrees of freedom DF to another t-value with different degrees of freedom fullDF: tConvert - function(tval, DF, fullDF) ifelse(DF=1, qt(pt(tval, DF), fullDF), 0) It works as expected with the following case: tConvert(c(2,3), c(10,12), 12) [1] 1.961905 3.00 However, it gives me warning for the example below although the output is still as intended: tConvert(c(2,3), c(0,12), 12) [1] 0 3 Warning message: In pt(q, df, lower.tail, log.p) : NaNs produced I'm confused about the warning especially considering the fact that the following works correctly without such warning: tConvert(2, 0, 12) [1] 0 What am I missing? The fact that ifelse evaluates both sides of the consequent and alternative. I also think you should update yur R to the most recent version since a current version does not issue that warning. -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused by lapply
On 2011-02-16 09:42, Sam Steingold wrote: Description: 'lapply' returns a list of the same length as 'X', each element of which is the result of applying 'FUN' to the corresponding element of 'X'. I expect that when I do lapply(vec,f) f would be called _once_ for each component of vec. this is not what I see: parse.num- function (s) { cat(parse.num1\n); str(s) s- as.character(s) cat(parse.num2\n); str(s) if (s == N/A) return(s); as.numeric(gsub(M$,e6,gsub(B$,e9,s))); } vec mcap 1 200.5B 2 19.1M 3 223.7B 4 888.0M 5 141.7B 6 273.5M 7 55.649B str(vec) 'data.frame': 7 obs. of 1 variable: $ mcap: Factor w/ 7 levels 141.7B,19.1M,..: 3 2 4 7 1 5 6 vec-lapply(vec,parse.num) parse.num1 Factor w/ 7 levels 141.7B,19.1M,..: 3 2 4 7 1 5 6 parse.num2 chr [1:7] 200.5B 19.1M 223.7B 888.0M 141.7B 273.5M ... Warning message: In if (s == N/A) return(s) : the condition has length 1 and only the first element will be used i.e., somehow parse.num is called on the whole vector vec, not its components. what am I doing wrong? Your 'vec' is NOT a vector. As your str(vec) clearly shows, you have a *data.frame*. The components of a data.frame are the columns (variables) of which you have only one and your function is applied to that. If you had two columns, parse.num would be applied to each column. So do this: lapply(vec[, 1], parse.num) Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confused by lapply
Description: 'lapply' returns a list of the same length as 'X', each element of which is the result of applying 'FUN' to the corresponding element of 'X'. I expect that when I do lapply(vec,f) f would be called _once_ for each component of vec. this is not what I see: parse.num - function (s) { cat(parse.num1\n); str(s) s - as.character(s) cat(parse.num2\n); str(s) if (s == N/A) return(s); as.numeric(gsub(M$,e6,gsub(B$,e9,s))); } vec mcap 1 200.5B 2 19.1M 3 223.7B 4 888.0M 5 141.7B 6 273.5M 7 55.649B str(vec) 'data.frame': 7 obs. of 1 variable: $ mcap: Factor w/ 7 levels 141.7B,19.1M,..: 3 2 4 7 1 5 6 vec-lapply(vec,parse.num) parse.num1 Factor w/ 7 levels 141.7B,19.1M,..: 3 2 4 7 1 5 6 parse.num2 chr [1:7] 200.5B 19.1M 223.7B 888.0M 141.7B 273.5M ... Warning message: In if (s == N/A) return(s) : the condition has length 1 and only the first element will be used i.e., somehow parse.num is called on the whole vector vec, not its components. what am I doing wrong? -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://dhimmi.com http://mideasttruth.com http://truepeace.org http://camera.org http://memri.org http://palestinefacts.org http://iris.org.il Despite the raising cost of living, it remains quite popular. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused
Hi Im confused by one thing, and if someone can explain it I would be a happy rev(strsplit(hej,NULL)) [[1]] [1] h e j lapply(strsplit(hej,NULL),rev) [[1]] [1] j e h Why dossent the first one work? What is it in R that fails so to say that you need to use lapply for it to get the correct output. -- View this message in context: http://r.789695.n4.nabble.com/Confused-tp3263700p3263700.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused
On 2011-02-07 00:18, Joel wrote: Hi Im confused by one thing, and if someone can explain it I would be a happy rev(strsplit(hej,NULL)) [[1]] [1] h e j lapply(strsplit(hej,NULL),rev) [[1]] [1] j e h Why dossent the first one work? What is it in R that fails so to say that you need to use lapply for it to get the correct output. See if this helps to see what's happening in the first case: L - list(fruit=c(apple, orange)) L rev(L) L - list(fruit=c(apple, orange), nuts=c(pecan, almond)) L rev(L) lapply(L, rev) For your second case, lapply() applies FUN to the pieces of the list. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused
On 07-Feb-11 08:18:49, Joel wrote: Hi Im confused by one thing, and if someone can explain it I would be a happy rev(strsplit(hej,NULL)) [[1]] [1] h e j lapply(strsplit(hej,NULL),rev) [[1]] [1] j e h Why dossent the first one work? What is it in R that fails so to say that you need to use lapply for it to get the correct output. -- WHat's causing the confusion in your example is that the result of strsplit(hej,NULL) consists of only one element. This is because (see ?strsplit) the value of strsplit is a *list*. For example, if you submit a character *vector* (with 2 elements hej and nej) to your rev(strsplit(...)): strsplit(c(hej,nej),NULL) # [[1]] # [1] h e j # # [[2]] # [1] n e j rev(strsplit(c(hej,nej),NULL)) # [[1]] # [1] n e j # # [[2]] # [1] h e j you now get a list with 2 elements [[1]]and [[2]], and rev() now outputs these in reverse order. With your character vector hej which has only one element, you get a list with only one element, and the rev() of this is exactly the same. Your lapply(strsplit(hej,NULL),rev) applies rev() to each element of the list returned by strsplit, so even if it only has one element that element gets its contents reversed. lapply(strsplit(c(hej,nej),NULL),rev) # [[1]] # [1] j e h # # [[2]] # [1] j e n Hoping this helps! Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 07-Feb-11 Time: 08:56:55 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused: Looping in dataframes
Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) } } The error message is that* Error in ets(x[i], model = AZZ, opt.crit = c(amse)) : y should be a univariate time series* Could someone please explain why this is happening? I also want to be able to extract data like coef's, errors (MAPE,MSE etc.) Thanks and regards, Phani -- A. Phani Kishan 3rd Year B.Tech Dept. of Computer Science Engineering IIT MADRAS Ph: +919962363545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
On 06/25/2010 10:02 AM, phani kishan wrote: Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) Hi, Please provide a reproducible example, as stated in the posting guide. My guess is that replacing x[i] by x[[i]] would solve the problem. Double brackets return a vector in stead of a data.frame that has just column i. cheers, Paul } } The error message is that* Error in ets(x[i], model = AZZ, opt.crit = c(amse)) : y should be a univariate time series* Could someone please explain why this is happening? I also want to be able to extract data like coef's, errors (MAPE,MSE etc.) Thanks and regards, Phani -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 253 5773 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
On Fri, Jun 25, 2010 at 1:54 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote: On 06/25/2010 10:02 AM, phani kishan wrote: Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) Hi, Please provide a reproducible example, as stated in the posting guide. My guess is that replacing x[i] by x[[i]] would solve the problem. Double brackets return a vector in stead of a data.frame that has just column i. Hey Paul, As requested. My example data frame sdata: SKU1SKU2 SKU3 SKU4 1 583.8 574.6 1106.9 648.1 2 441.7 552.8 1021.3 353.6 3 454.2 555.7 998.3 306.4 4 569.7 507.6 811.1 360.7 5 512.3 620.0 1046.3 713.9 6 580.8 668.2 732.0 490.9 7 648.5 766.9 653.4 422.1 8 617.4 657.1 602.1 190.8 9 826.8 767.3 640.5 324.1 10 1163.0 657.6 429.6 181.1 11 643.5 788.9 569.1 331.9 12 846.9 568.6 425.1 224.6 13 580.7 582.9 434.2 226.9 now when I apply lapply(sdata,ets) I get a result as: $SKU1 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.3845 Initial states: l = 533.3698 sigma: 181.7615 AIC AICc BIC 172.6144 173.8144 173.7443 $SKU2 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.5026 Initial states: l = 567.821 sigma: 86.7074 AIC AICc BIC 153.3704 154.5704 154.5003 $SKU3 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 1189.2221 b = -64.3776 sigma: 85.4153 AIC AICc BIC 156.9800 161.9800 159.2398 $SKU4 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 566.9001 b = -27.8818 sigma: 127.2654 AIC AICc BIC 167.3475 172.3475 169.6073 Now when I run the same using: myfun-function(x) { for(i in 1:length(x)) { ets(x[i]) } } I got the error as mentioned before. Now on modifying it to myfun-function(x) { for(i in 1:length(x)) { return(ets(x[[i]]) } } I only got the output as ETS(A,N,N) Call: ets(y = x[[i]], model = AZZ, opt.crit = c(amse)) Smoothing parameters: alpha = 0.3983 Initial states: l = 516.188 sigma: 181.8688 AIC AICc BIC 172.6298 173.8298 173.7597 I think its considering whole dataframe as a series. As said my objective it to essentially come up with a best exponential model for each of the SKU's in the dataframe. However I want to be able to extract information like mse, mape etc later. So kindly suggest. Thanks in advance, Phani cheers, Paul } } The error message is that* Error in ets(x[i], model = AZZ, opt.crit = c(amse)) : y should be a univariate time series* Could someone please explain why this is happening? I also want to be able to extract data like coef's, errors (MAPE,MSE etc.) Thanks and regards, Phani -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 253 5773 http://intamap.geo.uu.nl/~paul http://intamap.geo.uu.nl/%7Epaul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 -- A. Phani Kishan 3rd Year B.Tech Dept. of Computer Science Engineering IIT MADRAS Ph: +919962363545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
On Jun 25, 2010, at 7:09 AM, phani kishan wrote: On Fri, Jun 25, 2010 at 1:54 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote: On 06/25/2010 10:02 AM, phani kishan wrote: Hey, I have a data frame x which consists of say 10 vectors. I essentially want to find out the best fit exponential smoothing for each of the vectors. The problem while I'm getting results when i say lapply(x,ets) I am getting an error when I say myprint function(x) { for(i in 1:length(x)) { ets(x[i],model=AZZ,opt.crit=c(amse)) Hi, Please provide a reproducible example, as stated in the posting guide. My guess is that replacing x[i] by x[[i]] would solve the problem. Double brackets return a vector in stead of a data.frame that has just column i. Hey Paul, As requested. My example data frame sdata: SKU1SKU2 SKU3 SKU4 1 583.8 574.6 1106.9 648.1 2 441.7 552.8 1021.3 353.6 3 454.2 555.7 998.3 306.4 4 569.7 507.6 811.1 360.7 5 512.3 620.0 1046.3 713.9 6 580.8 668.2 732.0 490.9 7 648.5 766.9 653.4 422.1 8 617.4 657.1 602.1 190.8 9 826.8 767.3 640.5 324.1 10 1163.0 657.6 429.6 181.1 11 643.5 788.9 569.1 331.9 12 846.9 568.6 425.1 224.6 13 580.7 582.9 434.2 226.9 now when I apply lapply(sdata,ets) I get a result as: $SKU1 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.3845 Initial states: l = 533.3698 sigma: 181.7615 AIC AICc BIC 172.6144 173.8144 173.7443 $SKU2 ETS(A,N,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 0.5026 Initial states: l = 567.821 sigma: 86.7074 AIC AICc BIC 153.3704 154.5704 154.5003 $SKU3 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 1189.2221 b = -64.3776 sigma: 85.4153 AIC AICc BIC 156.9800 161.9800 159.2398 $SKU4 ETS(A,A,N) Call: ets(y = x, model = AZZ) Smoothing parameters: alpha = 1e-04 beta = 1e-04 Initial states: l = 566.9001 b = -27.8818 sigma: 127.2654 AIC AICc BIC 167.3475 172.3475 169.6073 Now when I run the same using: myfun-function(x) { for(i in 1:length(x)) { ets(x[i]) } } I got the error as mentioned before. Now on modifying it to myfun-function(x) { for(i in 1:length(x)) { return(ets(x[[i]]) } } I only got the output as ETS(A,N,N) Call: ets(y = x[[i]], model = AZZ, opt.crit = c(amse)) Smoothing parameters: alpha = 0.3983 Initial states: l = 516.188 sigma: 181.8688 AIC AICc BIC 172.6298 173.8298 173.7597 I think its considering whole dataframe as a series. Doubtful. It is quietly calculating all of the requested models but you did not do anything with them inside the loop (which is a function). You could have assigned them to something permanent or printed them (or both): ets_x - list() for(i in 1:length(x)) { print(ets(x[[i]]); ets_x - c(ets_x, ets(x[[i]]) } } ets_x As said my objective it to essentially come up with a best exponential model for each of the SKU's in the dataframe. However I want to be able to extract information like mse, mape etc later. So kindly suggest. Thanks in advance, Phani __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused: Looping in dataframes
Hey, I only got the output once cuz I was returning from the function at the end of one loop. I set that right and I have printed the values. function being used by me now is: function(x) { for(i in 1:length(x)) { print(names(x[i])) print(myets(x[[i]])) } } where myets is my customized exponential smoothing model. However the problem is that if I run my myets function individually on each of the SKU's I get values of MAPE, MSE etc. However by running the above loop I dont get the values. How do I store the values for me to look at them later? There are minor changes (not significant) in the values of parameters from applying the above function as opposed to lapply. Why could it be so?? Phani -- A. Phani Kishan 3rd Year B.Tech Dept. of Computer Science Engineering IIT MADRAS Ph: +919962363545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confused on model.frame evaluation
Hello! I'm reading through a logistic regression book and using R to replicate the results. Although my question is not directly related to this, it's the context I discovered it in, so here we go. Consider these data: interco - structure(list(white = c(1, 1, 0, 0), male = c(1, 0, 1, 0), yes = c(43, 26, 29, 22), no = c(134, 149, 23, 36), total = c(177, 175, 52, 58)), .Names = c(white, male, yes, no, total), row.names = c(NA, -4L), class = data.frame) We can use logistic regression to analyze this table, using glm's syntax for successes/failures described on the top of page 191 in MASS 4th edition. summary(glm(as.matrix(interco[c(yes, no)]) ~ white + male, data = interco, family = binomial)) The output prints out, no problem! Now, another data set, note the identifying feature of this one is that it contains a column with the same name as the object (i.e., working) working - structure(list(france = c(1, 1, 1, 1, 0, 0, 0, 0), manual = c(1, 1, 0, 0, 1, 1, 0, 0), famanual = c(1, 0, 1, 0, 1, 0, 1, 0), total = c(107, 65, 66, 171, 87, 65, 85, 148), working = c(85, 44, 24, 17, 24, 22, 1, 6), no = c(22, 21, 42, 154, 63, 43, 84, 142)), .Names = c(france, manual, famanual, total, working, no), row.names = c(NA, -8L), class = data.frame) summary(glm(as.matrix(working[c(working, no)]) ~ france + manual + famanual, data = working, family = binomial)) Error in model.frame.default(formula = as.matrix(working[c(working, : variable lengths differ (found for 'france') Well, this error goes away simply by renaming the working variable in the data.frame working to something else. I found the eval line in model.frame that's throwing the error, but I'm still confused as to why. I'm sure it's not a bug, but could someone point to a thread or offer some gentle advice on what's happening? I think it's related to: test - data.frame(name1 = 1:5, name2 = 6:10, test = 11:15) eval(expression(test[c(name1, name2)])) eval(expression(interco[c(name1, test)])) Thanks! --Erik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused on model.frame evaluation
snip I'm sure it's not a bug, but could someone point to a thread or offer some gentle advice on what's happening? I think it's related to: test - data.frame(name1 = 1:5, name2 = 6:10, test = 11:15) eval(expression(test[c(name1, name2)])) eval(expression(interco[c(name1, test)])) scratch that last one, obviously a typo was causing my confusion there! The model.frame stuff remains a mystery to me though... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused on model.frame evaluation
On Apr 30, 2010, at 4:57 PM, Erik Iverson wrote: snip I'm sure it's not a bug, but could someone point to a thread or offer some gentle advice on what's happening? I think it's related to: test - data.frame(name1 = 1:5, name2 = 6:10, test = 11:15) eval(expression(test[c(name1, name2)])) eval(expression(interco[c(name1, test)])) scratch that last one, obviously a typo was causing my confusion there! The model.frame stuff remains a mystery to me though... Hi Erik, It's late on a Friday, it's grey and raining here in Minneapolis and I am short on caffeine, but, that being said, consider the following :-) working france manual famanual total working no 1 1 11 107 85 22 2 1 1065 44 21 3 1 0166 24 42 4 1 00 171 17 154 5 0 1187 24 63 6 0 1065 22 43 7 0 0185 1 84 8 0 00 148 6 142 as.matrix(working[c(working, no)]) working no [1,] 85 22 [2,] 44 21 [3,] 24 42 [4,] 17 154 [5,] 24 63 [6,] 22 43 [7,] 1 84 [8,] 6 142 with(working, as.matrix(working[c(working, no)])) [,1] [1,] NA [2,] NA For the incantations of model.frame(), the formula terms are evaluated first within the scope of the data frame indicated for the 'data' argument. Thus, in the second case, I am asking for the as.matrix(...) call to be evaluated within the scope of the 'working' data frame, which returns a matrix with only two rows, one NA for each column that was asked for and not found, which is different than the number of rows in 'working', thus you get the error as soon as the 'france' column is evaluated in the formula to create the model frame: Error in model.frame.default(formula = as.matrix(working[c(working, : variable lengths differ (found for 'france') 2 rows in the response matrix versus 8 rows for 'france'... It is kind of like you are asking for: as.matrix(working$working[c(working, no)]) [,1] [1,] NA [2,] NA Now, try this: with(working, matrix(c(working, no), ncol = 2)) [,1] [,2] [1,] 85 22 [2,] 44 21 [3,] 24 42 [4,] 17 154 [5,] 24 63 [6,] 22 43 [7,]1 84 [8,]6 142 and then: summary(glm(matrix(c(working, no), ncol = 2) ~ france + manual + famanual, data = working, family = binomial)) Call: glm(formula = matrix(c(working, no), ncol = 2) ~ france + manual + famanual, family = binomial, data = working) Deviance Residuals: 1 2 3 4 5 6 7 0.09316 -0.14108 2.38028 -1.91838 -1.48196 1.84993 -1.61864 8 1.16747 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -3.6902 0.2547 -14.489 2e-16 *** france1.9474 0.2162 9.008 2e-16 *** manual2.5199 0.2168 11.625 2e-16 *** famanual 0.5522 0.2017 2.738 0.00618 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 308.329 on 7 degrees of freedom Residual deviance: 18.976 on 4 degrees of freedom AIC: 60.162 Number of Fisher Scoring iterations: 4 Does that help top clarify? Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confused with yearmon, xts and maybe zoo
R-listers, I am using xts with a yearmon index, but am getting some inconsistent results with the date index when i drop observations (for example by using na.omit). The issue is illustrated in the example below. If I start with a monthly zooreg series starting in 2009, yearmon converts this to Dec-2008. Not such a worry for my example, but strange. Having converted to xts, i drop the first observation. The index shows jan 2009. But if i create a new variable with this index, it shifts the series back to dec 2008. No doubt i am doing something wrong. very grateful for any tips library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) # monthly data starting 2009 x - xts(z,as.yearmon(index(z)))# starts Dec 2008 xx - x[-1, ] # drop first obs (eg through na.omit) index(xx) # starts jan 2009 xxx - xts(NA[1:length(xx)],index(xx))# back to dec 2008 periodicity(x) periodicity(xx) periodicty(xxx) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused with yearmon, xts and maybe zoo
On Sun, Apr 18, 2010 at 8:25 AM, simeon duckworth simeonduckwo...@gmail.com wrote: R-listers, I am using xts with a yearmon index, but am getting some inconsistent results with the date index when i drop observations (for example by using na.omit). The issue is illustrated in the example below. If I start with a monthly zooreg series starting in 2009, yearmon converts this to Dec-2008. Not such a worry for my example, but strange. Having converted to xts, i drop the first observation. The index shows jan 2009. But if i create a new variable with this index, it shifts the series back to dec 2008. No doubt i am doing something wrong. very grateful for any tips library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) # monthly data starting 2009 x - xts(z,as.yearmon(index(z))) # starts Dec 2008 Not for me. It starts in January 2009 for me. Also please show your code in such a way that it can be pasted into a session. Either comment out the output using # or else preface input lines with so its clear what is input and what is output. And show what versions of the software and R you are using and what platform. z - zooreg(1:24,frequency=12,start=c(2009,1)) # monthly data starting 2009 head(z) 2009(1) 2009(2) 2009(3) 2009(4) 2009(5) 2009(6) 1 2 3 4 5 6 x - xts(z,as.yearmon(index(z))) head(x) x Jan 2009 1 Feb 2009 2 Mar 2009 3 Apr 2009 4 May 2009 5 Jun 2009 6 R.version.string [1] R version 2.10.1 (2009-12-14) win.version() [1] Windows Vista (build 6002) Service Pack 2 packageDescription(zoo)$Version [1] 1.6-3 packageDescription(xts)$Version [1] 0.7-0 I also tried older versions zoo 1.6-0, xts 0.6-8 and R 2.9.2 and got the same result as I got here. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused with yearmon, xts and maybe zoo
Hi Gabor Thats odd. I still get the same problem with the same versions of the software in your mail ... viz as.yearmon converts 2009(1) to Dec-2008 and although xts is indexed at Jan 2009 in xx, using it to create another xts object with that index reverts to Dec-2008. grateful for any suggestions ## code ## library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) x - xts(z,as.yearmon(index(z))) xx - x[-1, ] index(xx) xxx - xts(NA[1:length(xx)],index(xx)) periodicity(x) periodicity(xx) periodicty(xxx)b ## results ### periodicity(x) Monthly periodicity from Dec 2008 to Nov 2010 periodicity(xx) Monthly periodicity from Jan 2009 to Nov 2010 periodicity(xxx) Monthly periodicity from Dec 2008 to Oct 2010 R.version.string [1] R version 2.10.1 (2009-12-14) win.version() [1] Windows XP (build 2600) Service Pack 3 packageDescription(xts)$Version [1] 0.7-0 Sys.time() [1] 2010-04-18 19:37:26 BST On Sun, Apr 18, 2010 at 1:25 PM, simeon duckworth simeonduckwo...@gmail.com wrote: R-listers, I am using xts with a yearmon index, but am getting some inconsistent results with the date index when i drop observations (for example by using na.omit). The issue is illustrated in the example below. If I start with a monthly zooreg series starting in 2009, yearmon converts this to Dec-2008. Not such a worry for my example, but strange. Having converted to xts, i drop the first observation. The index shows jan 2009. But if i create a new variable with this index, it shifts the series back to dec 2008. No doubt i am doing something wrong. very grateful for any tips library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) # monthly data starting 2009 x - xts(z,as.yearmon(index(z)))# starts Dec 2008 xx - x[-1, ] # drop first obs (eg through na.omit) index(xx) # starts jan 2009 xxx - xts(NA[1:length(xx)],index(xx))# back to dec 2008 periodicity(x) periodicity(xx) periodicty(xxx) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused with yearmon, xts and maybe zoo
On Sun, Apr 18, 2010 at 2:51 PM, simeon duckworth simeonduckwo...@gmail.com wrote: Hi Gabor Thats odd. I still get the same problem with the same versions of the software in your mail ... viz as.yearmon converts 2009(1) to Dec-2008 We can`t conclude that its in as.yearmon based on the output shown. What is the output of: index(z) as.yearmon(index(z)) x This is what I get: index(z) [1] 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500 2009.583 [9] 2009.667 2009.750 2009.833 2009.917 2010.000 2010.083 2010.167 2010.250 [17] 2010.333 2010.417 2010.500 2010.583 2010.667 2010.750 2010.833 2010.917 as.yearmon(index(z)) [1] Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Jun 2009 [7] Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009 [13] Jan 2010 Feb 2010 Mar 2010 Apr 2010 May 2010 Jun 2010 [19] Jul 2010 Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 head(x) x Jan 2009 1 Feb 2009 2 Mar 2009 3 Apr 2009 4 May 2009 5 Jun 2009 6 and although xts is indexed at Jan 2009 in xx, using it to create another xts object with that index reverts to Dec-2008. grateful for any suggestions ## code ## library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) x - xts(z,as.yearmon(index(z))) xx - x[-1, ] index(xx) xxx - xts(NA[1:length(xx)],index(xx)) periodicity(x) periodicity(xx) periodicty(xxx)b ## results ### periodicity(x) Monthly periodicity from Dec 2008 to Nov 2010 periodicity(xx) Monthly periodicity from Jan 2009 to Nov 2010 periodicity(xxx) Monthly periodicity from Dec 2008 to Oct 2010 R.version.string [1] R version 2.10.1 (2009-12-14) win.version() [1] Windows XP (build 2600) Service Pack 3 packageDescription(xts)$Version [1] 0.7-0 Sys.time() [1] 2010-04-18 19:37:26 BST On Sun, Apr 18, 2010 at 1:25 PM, simeon duckworth simeonduckwo...@gmail.com wrote: R-listers, I am using xts with a yearmon index, but am getting some inconsistent results with the date index when i drop observations (for example by using na.omit). The issue is illustrated in the example below. If I start with a monthly zooreg series starting in 2009, yearmon converts this to Dec-2008. Not such a worry for my example, but strange. Having converted to xts, i drop the first observation. The index shows jan 2009. But if i create a new variable with this index, it shifts the series back to dec 2008. No doubt i am doing something wrong. very grateful for any tips library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) # monthly data starting 2009 x - xts(z,as.yearmon(index(z))) # starts Dec 2008 xx - x[-1, ] # drop first obs (eg through na.omit) index(xx) # starts jan 2009 xxx - xts(NA[1:length(xx)],index(xx)) # back to dec 2008 periodicity(x) periodicity(xx) periodicty(xxx) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confused with yearmon, xts and maybe zoo
... forgot to post this back to the r-list. it seems that the problem is with xts rather than zoo and yearmon per se ie using yearmon to index xts gives inconsistent results. grateful for any help anyone can offer. thanks On Sun, Apr 18, 2010 at 8:15 PM, simeon duckworth simeonduckwo...@gmail.com wrote: Hi gabor It seems asthough the issue is in working with yearmon in xts. the command as.yearmon(index(z)) works in the same way as yours, but not when used to index the xts object. ## code library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) x - xts(z,as.yearmon(index(z))) xx - x[-1, ] index(xx) xxx - xts(NA[1:length(xx)],index(xx)) index(z) as.yearmon(index(z)) head(x,3) head(xx,3) head(xxx,3) ## output index(z) [1] 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500 2009.583 [9] 2009.667 2009.750 2009.833 2009.917 2010.000 2010.083 2010.167 2010.250 [17] 2010.333 2010.417 2010.500 2010.583 2010.667 2010.750 2010.833 2010.917 as.yearmon(index(z)) [1] Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Jun 2009 [7] Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009 [13] Jan 2010 Feb 2010 Mar 2010 Apr 2010 May 2010 Jun 2010 [19] Jul 2010 Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 head(x,3) x Dec 2008 1 Jan 2009 2 Feb 2009 3 head(xx,3) x Jan 2009 2 Feb 2009 3 Mar 2009 4 head(xxx,3) [,1] Dec 2008 NA Jan 2009 NA Feb 2009 NA On Sun, Apr 18, 2010 at 8:00 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sun, Apr 18, 2010 at 2:51 PM, simeon duckworth simeonduckwo...@gmail.com wrote: Hi Gabor Thats odd. I still get the same problem with the same versions of the software in your mail ... viz as.yearmon converts 2009(1) to Dec-2008 We can`t conclude that its in as.yearmon based on the output shown. What is the output of: index(z) as.yearmon(index(z)) x This is what I get: index(z) [1] 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500 2009.583 [9] 2009.667 2009.750 2009.833 2009.917 2010.000 2010.083 2010.167 2010.250 [17] 2010.333 2010.417 2010.500 2010.583 2010.667 2010.750 2010.833 2010.917 as.yearmon(index(z)) [1] Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Jun 2009 [7] Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009 [13] Jan 2010 Feb 2010 Mar 2010 Apr 2010 May 2010 Jun 2010 [19] Jul 2010 Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 head(x) x Jan 2009 1 Feb 2009 2 Mar 2009 3 Apr 2009 4 May 2009 5 Jun 2009 6 and although xts is indexed at Jan 2009 in xx, using it to create another xts object with that index reverts to Dec-2008. grateful for any suggestions ## code ## library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) x - xts(z,as.yearmon(index(z))) xx - x[-1, ] index(xx) xxx - xts(NA[1:length(xx)],index(xx)) periodicity(x) periodicity(xx) periodicty(xxx)b ## results ### periodicity(x) Monthly periodicity from Dec 2008 to Nov 2010 periodicity(xx) Monthly periodicity from Jan 2009 to Nov 2010 periodicity(xxx) Monthly periodicity from Dec 2008 to Oct 2010 R.version.string [1] R version 2.10.1 (2009-12-14) win.version() [1] Windows XP (build 2600) Service Pack 3 packageDescription(xts)$Version [1] 0.7-0 Sys.time() [1] 2010-04-18 19:37:26 BST On Sun, Apr 18, 2010 at 1:25 PM, simeon duckworth simeonduckwo...@gmail.com wrote: R-listers, I am using xts with a yearmon index, but am getting some inconsistent results with the date index when i drop observations (for example by using na.omit). The issue is illustrated in the example below. If I start with a monthly zooreg series starting in 2009, yearmon converts this to Dec-2008. Not such a worry for my example, but strange. Having converted to xts, i drop the first observation. The index shows jan 2009. But if i create a new variable with this index, it shifts the series back to dec 2008. No doubt i am doing something wrong. very grateful for any tips library(xts) z - zooreg(1:24,frequency=12,start=c(2009,1)) # monthly data starting 2009 x - xts(z,as.yearmon(index(z)))# starts Dec 2008 xx - x[-1, ] # drop first obs (eg through na.omit) index(xx) # starts jan 2009 xxx - xts(NA[1:length(xx)],index(xx))# back to dec 2008 periodicity(x) periodicity(xx) periodicty(xxx) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Re: [R] confused by classes and methods.
Hi Rob, I just started reading about classes (and also learning R), so I apologize if the following code is confusing you more. I simplified the code somewhat in order to better understand what's going on. I was wondering: are you deliberately reimplementing the builtin update() function? setClass(Class=StatisticInfo, representation( oldData = data.frame, newData = data.frame ) ) # declare the update method, even though it exists already. setGeneric ( name=update, def=function(object){standardGeneric(update)} ) setMethod(f=update, signature (StatisticInfo), definition = function(object){ min = min(obj...@newdata, obj...@olddata, na.rm=T) avg = mean(mean(cbind(obj...@newdata, obj...@olddata))) max = max(obj...@newdata, obj...@olddata, na.rm=T) return(list(min, avg, max)) } ) old - data.frame(runif(10, 1, 10)) new - data.frame(runif(10, 1, 9)) instance - new(Class=StatisticInfo, oldData=old, newData=new) update(instance) Does this make sense to you? Cheers!! Albert-Jan ~~ In the face of ambiguity, refuse the temptation to guess. ~~ --- On Tue, 3/9/10, Rob Forler rfor...@uchicago.edu wrote: From: Rob Forler rfor...@uchicago.edu Subject: [R] confused by classes and methods. To: r-help@r-project.org Date: Tuesday, March 9, 2010, 12:09 AM Hello, I have a simple class that looks like: setClass(statisticInfo, representation( max = numeric, min = numeric, beg = numeric, current = numeric, avg = numeric, obs = vector ) ) and the following function updateStatistic - function(statistic, newData){ statis...@obs = c(statis...@obs, newData) statis...@max = max(newData, statis...@max, na.rm=T) statis...@min = min(newData, statis...@min, na.rm=T) statis...@avg = mean(statis...@obs) statis...@current = newData if(length(statis...@obs)==1 || is.na(statis...@beg)){ statis...@beg = newData } return(statistic) } Firstly, I know you can use methods which seems to add some value. I looked at http://developer.r-project.org/methodDefinition.html but I try setMethod(update, signature(statistic=statisticInfo, newData=numeric), function(statistic, newData){ statis...@obs = c(statis...@obs, newData) statis...@max = max(newData, statis...@max, na.rm=T) statis...@min = min(newData, statis...@min, na.rm=T) statis...@avg = mean(statis...@obs) statis...@current = newData if(length(statis...@obs)==1 || is.na(statis...@beg)){ statis...@beg = newData } return(statistic) } ) Creating a new generic function for update in .GlobalEnv Error in match.call(fmatch, fcall) : unused argument(s) (statistic = statisticInfo, newData = numeric) 1: source(tca.init.R, chdir = T) 2: eval.with.vis(ei, envir) 3: eval.with.vis(expr, envir, enclos) 4: source(../../studies/tca.tradeClassifyFuncs.R) 5: eval.with.vis(ei, envir) 6: eval.with.vis(expr, envir, enclos) 7: setMethod(update, signature(statistic = statisticInfo, newData = numeric), function(statistic, newData) { 8: isSealedMethod(f, signature, fdef, where = where) 9: getMethod(f, signature, optional = TRUE, where = where, fdef = fGen) 10: matchSignature(signature, f I don't understand this any help would be appreciated. Secondly, can anyone give any examples of where methods are used that makes sense besides just checking the class inputs? Thirdly, I've looked into passing by reference in R, and some options come up, but in general they seem to be fairly complicated. I would like update to work more like my update function to work without having to return a a new object. Something like statList = list(new(statisticInfo)) updateStatistic(statList[[1]],3) statList[[1]] #this would then have the updated one and not the old one. Anyways, The main reason I'm asking these questions is because I can't really find a good online resource for this. Any help would be greatly appreciated. Thanks, Rob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confused by classes and methods.
Hello, I have a simple class that looks like: setClass(statisticInfo, representation( max = numeric, min = numeric, beg = numeric, current = numeric, avg = numeric, obs = vector ) ) and the following function updateStatistic - function(statistic, newData){ statis...@obs = c(statis...@obs, newData) statis...@max = max(newData, statis...@max, na.rm=T) statis...@min = min(newData, statis...@min, na.rm=T) statis...@avg = mean(statis...@obs) statis...@current = newData if(length(statis...@obs)==1 || is.na(statis...@beg)){ statis...@beg = newData } return(statistic) } Firstly, I know you can use methods which seems to add some value. I looked at http://developer.r-project.org/methodDefinition.html but I try setMethod(update, signature(statistic=statisticInfo, newData=numeric), function(statistic, newData){ statis...@obs = c(statis...@obs, newData) statis...@max = max(newData, statis...@max, na.rm=T) statis...@min = min(newData, statis...@min, na.rm=T) statis...@avg = mean(statis...@obs) statis...@current = newData if(length(statis...@obs)==1 || is.na(statis...@beg)){ statis...@beg = newData } return(statistic) } ) Creating a new generic function for update in .GlobalEnv Error in match.call(fmatch, fcall) : unused argument(s) (statistic = statisticInfo, newData = numeric) 1: source(tca.init.R, chdir = T) 2: eval.with.vis(ei, envir) 3: eval.with.vis(expr, envir, enclos) 4: source(../../studies/tca.tradeClassifyFuncs.R) 5: eval.with.vis(ei, envir) 6: eval.with.vis(expr, envir, enclos) 7: setMethod(update, signature(statistic = statisticInfo, newData = numeric), function(statistic, newData) { 8: isSealedMethod(f, signature, fdef, where = where) 9: getMethod(f, signature, optional = TRUE, where = where, fdef = fGen) 10: matchSignature(signature, f I don't understand this any help would be appreciated. Secondly, can anyone give any examples of where methods are used that makes sense besides just checking the class inputs? Thirdly, I've looked into passing by reference in R, and some options come up, but in general they seem to be fairly complicated. I would like update to work more like my update function to work without having to return a a new object. Something like statList = list(new(statisticInfo)) updateStatistic(statList[[1]],3) statList[[1]] #this would then have the updated one and not the old one. Anyways, The main reason I'm asking these questions is because I can't really find a good online resource for this. Any help would be greatly appreciated. Thanks, Rob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about appending to list behavior...
Through help from the list and a little trial and error (mainly error) I think I figured out a couple of ways to append to a list. Now I am trying to access the data that I appended to the list. The example below shows where I'm trying to access that information via two different methods. It turns out that trying to access the data the way one would elements in a data.frame does not work. However, using the standard way of accessing data from a list, a la [[...]], seems to provide an answer. By any chance is there more documentation out there on lists and this behavior, as I would like to try to better understand what is really going on and why one approach works and another doesn't. Thank you again for all the help and feedback, as I love lists and especially the fact (that unlike data.frames) you can store different type data and also arrays of different lengths. They are great. example_list-list(tracking-c(house), house_type-c(brick, wood), sizes-c(1600, 1800, 2000, 2400)) example_list [[1]] [1] house [[2]] [1] brick wood [[3]] [1] 1600 1800 2000 2400 cost_limits-c(20.25, 350010.15) example_list[[4]]-cost_limits example_list [[1]] [1] house [[2]] [1] brick wood [[3]] [1] 1600 1800 2000 2400 [[4]] [1] 20.2 350010.2 c(example_list,list(CostStuff=cost_limits)) [[1]] [1] house [[2]] [1] brick wood [[3]] [1] 1600 1800 2000 2400 [[4]] [1] 20.2 350010.2 $CostStuff [1] 20.2 350010.2 example_list$CostStuff NULL example_list[[5]] Error in example_list[[5]] : subscript out of bounds example_list[[4]] [1] 20.2 350010.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about appending to list behavior...
JustADude wrote: ... By any chance is there more documentation out there on lists and this behavior, as I would like to try to better understand what is really going on and why one approach works and another doesn't. ... Example reproduced below You forgot an assign. Dieter example_list-list(tracking-c(house), house_type-c(brick, wood), sizes-c(1600, 1800, 2000, 2400)) example_list cost_limits-c(20.25, 350010.15) example_list[[4]]-cost_limits example_list # you forgot the left side here # c(example_list,list(CostStuff=cost_limits)) # Should be example_list - c(example_list,list(CostStuff=cost_limits)) example_list$CostStuff example_list[[5]] -- View this message in context: http://n4.nabble.com/Confused-about-appending-to-list-behavior-tp1561547p1561723.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused on using expand.grid(), array(), image() and npudens(np) in my case
Hi all, I want to use the npudens() function in the np package (multivariate kernel density estimation), but was confused by the several functions in the following codes,expand.grid(),array(),image() and npudensbw(). This confusion will only be generated in =3 dimensions. I marked the four places with confusion1-4. I think there should be some kind of correspondence in those four places,but cannot figure them out.Thanks very much for chewing on this. #simulated dataset: d x1-c(runif(100,0,1),runif(50,0.67,1));y1-c(runif(100,0,1),runif(50,0.67,1));d1-data.frame(x1,y1);colnames(d1)-c(x,y) x2-c(runif(100,0,1),runif(50,0.33,0.67));y2-c(runif(100,0,1),runif(50,0.33,0.67));d2-data.frame(x2,y2);colnames(d2)-c(x,y) x3-c(runif(100,0,1),runif(50,0,0.33));y3-c(runif(100,0,1),runif(50,0,0.33));d3-data.frame(x3,y3);colnames(d3)-c(x,y) d-rbind(d1,d2,d3) d$tf-c(rep(1,150),rep(2,150),rep(3,150)) plot(d1);points(d2,col=red);points(d3,col=green) attach(d) #Confusion1:how to specify the formula in the npudensbw() correctly? I find the sequence of ordered(tf)+x+y is important and here i may have a wrong specification bw - npudensbw(formula=~ordered(tf)+x+y, bwmethod=cv.ml) #confusion1 year.seq - sort(unique(d$tf)) #length is 3 x.seq - seq(0,1,0.02) #length is 51 y.seq - seq(0,1,0.02) #length is 51 #Confusion2:what is the correct sequence for the three variables (year.seq,x.seq and y.seq) in expand.grid() data.eval - expand.grid(tf=year.seq,x=x.seq,y=y.seq) #confusion2 fhat - fitted(npudens(bws=bw, newdata=data.eval)) #Confusion3:what is the correct sequence for the three variables in the c() options of array() f - array(fhat, c(51,51,3)) #number of year.seq is 3, and number of x.seq and y.seq are 51,confusion3 brks - quantile(f, seq(0,1,0.05));cols - heat.colors(length(brks)-1);oldpar - par(mfrow=c(1,3)) #Confusion4:what is the correct sequence for the three variables(tf,x and y) in the image() for (i in 1:3) image(x.seq, y.seq, f[,,i],asp=1, xlab=, ylab=, main=i, breaks=brks, col=cols) #confusion4 par(oldpar) #This was also confused in 4 ,5 and more dimensions. Any help or suggestions are greatly appreciated. -- - Jane Chang Queen's [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused - better empirical results with error in data
Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard svm (e1017). We are looking at probabilities of a binary outcome. The input data is generated by a perl script that calculates a bunch of things, fetches data from a database, etc. We train the system on 30,000 examples and then test the system on an unseen set of 5,000 records. The real world results on the test set looked VERY good. We were really happy with our model. The, we noticed that there was a big error in our data generation script and one of the values (an average of sorts.) was being calculated incorrectly. (The perl script failed to clear two iterators, so they both grew with every record.) As an quick experiment, we removed that item from our data set and re-ran the process. The results were not very good. Perhaps 75% as good as training with the wrong factor included. So, this is really a philosophical question. Do we: 1) Shrug and say, who cares, the SVM figured it out and likes that bad data item for some inexplicable reason 2) Tear into the math and try to figure out WHY the SVM is predicting more accurately Any opinions?? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused - better empirical results with error in data
Predicting whilst confused is unlikely to produce sound predictions... my vote is for finding out why before believing anything. Noah Silverman n...@smartmediacorp.com 09/07/09 8:33 PM Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard svm (e1017). We are looking at probabilities of a binary outcome. The input data is generated by a perl script that calculates a bunch of things, fetches data from a database, etc. We train the system on 30,000 examples and then test the system on an unseen set of 5,000 records. The real world results on the test set looked VERY good. We were really happy with our model. The, we noticed that there was a big error in our data generation script and one of the values (an average of sorts.) was being calculated incorrectly. (The perl script failed to clear two iterators, so they both grew with every record.) As an quick experiment, we removed that item from our data set and re-ran the process. The results were not very good. Perhaps 75% as good as training with the wrong factor included. So, this is really a philosophical question. Do we: 1) Shrug and say, who cares, the SVM figured it out and likes that bad data item for some inexplicable reason 2) Tear into the math and try to figure out WHY the SVM is predicting more accurately Any opinions?? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused - better empirical results with error in data
On Mon, Sep 7, 2009 at 12:33 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP So, this is really a philosophical question. Do we: 1) Shrug and say, who cares, the SVM figured it out and likes that bad data item for some inexplicable reason 2) Tear into the math and try to figure out WHY the SVM is predicting more accurately Any opinions?? Thanks! Boy, I'd sure think you'd want to know why it worked with the 'wrong' calculations. It's not that the math is wrong, really, but rather that it wasn't what you thought it was. I cannot see why you wouldn't want to know why this mistake helped. Won't future project benefit? Just my 2 cents, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused - better empirical results with error in data
You both make good points. Ideally, it would be nice to know WHY it works. Without digging into too much verbiage, the system is designed to predict the outcome of certain events. The broken model predicts outcomes correctly much more frequently than one with the broken data withheld. So, to answer Mark's question, we say it's better because we see much better results with our broken model when applied to real-world data used for testing. I have one theory. The data is listed in our CSV file from newest to oldest. We are supposed to calculated a valued that is an average of some items. We loop through some queries to our database and increment two variables - $total_found and $total_score. The final value is simply $total_score / $total_found. Our programmer forgot to reset both $total_score and $total_found back to zero for each record we process. So both grow. I think that this may, in a way, be some warped form of a recency weighted score. The newer records will have a score more affected by their contribution to the wrongly growing totals. A record that is closer to the end of the data set will be starting with HUGE values for $total_score and $total_found, so addition of its values will have very little effect. We've done the following so far today (Note, scores are just relative to indicate performance. Higher is better) 1) Run with bad data = 6.9 2) Run with bad data missing = 5.5 3) Run with correct data = ?? (We're running now, will take a few hours to compute.) I might also try to plot the bad data. It would be interesting to see what shape it has... On 9/7/09 1:05 PM, Mark Knecht wrote: On Mon, Sep 7, 2009 at 12:33 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP So, this is really a philosophical question. Do we: 1) Shrug and say, who cares, the SVM figured it out and likes that bad data item for some inexplicable reason 2) Tear into the math and try to figure out WHY the SVM is predicting more accurately Any opinions?? Thanks! Boy, I'd sure think you'd want to know why it worked with the 'wrong' calculations. It's not that the math is wrong, really, but rather that it wasn't what you thought it was. I cannot see why you wouldn't want to know why this mistake helped. Won't future project benefit? Just my 2 cents, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused - better empirical results with error in data
You both make good points. Ideally, it would be nice to know WHY it works. Without digging into too much verbiage, the system is designed to predict the outcome of certain events. The broken model predicts outcomes correctly much more frequently than one with the broken data withheld. So, to answer Mark's question, we say it's better because we see much better results with our broken model when applied to real-world data used for testing. I have one theory. The data is listed in our CSV file from newest to oldest. We are supposed to calculated a valued that is an average of some items. We loop through some queries to our database and increment two variables - $total_found and $total_score. The final value is simply $total_score / $total_found. Our programmer forgot to reset both $total_score and $total_found back to zero for each record we process. So both grow. I think that this may, in a way, be some warped form of a recency weighted score. The newer records will have a score more affected by their contribution to the wrongly growing totals. A record that is closer to the end of the data set will be starting with HUGE values for $total_score and $total_found, so addition of its values will have very little effect. We've done the following so far today (Note, scores are just relative to indicate performance. Higher is better) 1) Run with bad data = 6.9 2) Run with bad data missing = 5.5 3) Run with correct data = ?? (We're running now, will take a few hours to compute.) I might also try to plot the bad data. It would be interesting to see what shape it has... On 9/7/09 1:05 PM, Mark Knecht wrote: On Mon, Sep 7, 2009 at 12:33 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP So, this is really a philosophical question. Do we: 1) Shrug and say, who cares, the SVM figured it out and likes that bad data item for some inexplicable reason 2) Tear into the math and try to figure out WHY the SVM is predicting more accurately Any opinions?? Thanks! Boy, I'd sure think you'd want to know why it worked with the 'wrong' calculations. It's not that the math is wrong, really, but rather that it wasn't what you thought it was. I cannot see why you wouldn't want to know why this mistake helped. Won't future project benefit? Just my 2 cents, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused - better empirical results with error in data
On Mon, Sep 7, 2009 at 1:22 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP The data is listed in our CSV file from newest to oldest. We are supposed to calculated a valued that is an average of some items. We loop through some queries to our database and increment two variables - $total_found and $total_score. The final value is simply $total_score / $total_found. SNIP This does seem like it's rife with possibilities for non-causal action. (Assuming you process from newest toward oldest which is what I think you say you are doing...) I'm pretty sure that if I knew that the Dow was going to be higher 3 months from now then my day trading results would tend toward long vs short and I'd do better. Unfortunately I don't know where it will be and cannot really do that. Have you considered processing the data in the other direction. Not in R, but rather reversing the data frame or better yet writing the csv file in date order? Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused - better empirical results with error in data
Interesting point. Our data is NOT continuous. Sure, some of the test examples are older than others, but there is no relationship between them. (More Markov like in behavior.) When creating a specific record, we actually account for this in our SQL queries which tend to be along the lines of: select x from table where id=1234 and date '2008-05-01' This way, whatever data we're looking at, we set things so the current and future data doesn't exist yet. My understanding was that an SVM wouldn't care about the order of the data input as long as the examples are independent. Regardless of all this, we look at real-world test for our evaluation. 1) We trained the system on examples prior to a certain date. 2) We test the system with unseen examples after that date. We take the approach of: If we had used this model, what would our portfolio be at the end of the test period. Sure, we also look at things like AUC and R2 (from applying the model to the TEST data.) Generally, we see a correlation between AUC, R2, and our final result, but not a perfect one. A model with a SLIGHTLY lower R2 actually produced better results in a few cases. This process should produce solid results as we are eliminating any chance of over-fitting when measuring performance. So, one could argue, that whatever gives the best results on the test data is the best model, regardless of the correctness of the theory. Just for fun, I'll see if I can schedule a few hours to run the same experiment with the training data order reversed. If I'm correct, the results should be the same. Thanks! -- N On 9/7/09 2:34 PM, Mark Knecht wrote: On Mon, Sep 7, 2009 at 1:22 PM, Noah Silvermann...@smartmediacorp.com wrote: SNIP The data is listed in our CSV file from newest to oldest. We are supposed to calculated a valued that is an average of some items. We loop through some queries to our database and increment two variables - $total_found and $total_score. The final value is simply $total_score / $total_found. SNIP This does seem like it's rife with possibilities for non-causal action. (Assuming you process from newest toward oldest which is what I think you say you are doing...) I'm pretty sure that if I knew that the Dow was going to be higher 3 months from now then my day trading results would tend toward long vs short and I'd do better. Unfortunately I don't know where it will be and cannot really do that. Have you considered processing the data in the other direction. Not in R, but rather reversing the data frame or better yet writing the csv file in date order? Cheers, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused about behavior of an S4 object containing a ts object
I posted the question below about a month ago but received no response. I still have not been able to figure out what is happening. I also noticed another oddity. When the data part of the object is a multivariate time series, it doesn't show up in the structure, but it can be treated as a multivariate time series. Is this a bug in str? setClass(tsExtended, representation = representation(description + = character), contains = ts) [1] tsExtended tmp - new(tsExtended, matrix(1:20, ncol=2), description = My Time Series) tsp(tmp) - c(1, 5.5, 2) tmp Object of class tsExtended Time Series: Start = c(1, 1) End = c(5, 2) Frequency = 2 Series 1 Series 2 1.01 11 1.52 12 2.03 13 2.54 14 3.05 15 3.56 16 4.07 17 4.58 18 5.09 19 5.5 10 20 Slot description: [1] My Time Series str(tmp) Formal class 'tsExtended' [package .GlobalEnv] with 4 slots ..@ .Data : int [1:20] 1 2 3 4 5 6 7 8 9 10 ... ..@ description: chr My Time Series ..@ tsp: num [1:3] 1 5.5 2 ..@ .S3Class : chr ts tmp[,1] Time Series: Start = c(1, 1) End = c(5, 2) Frequency = 2 [1] 1 2 3 4 5 6 7 8 9 10 plot(tmp[,2]) Mark Lyman -Original Message- From: Lyman, Mark Sent: Thursday, December 18, 2008 1:02 PM To: 'r-help@r-project.org' Subject: Confused about behavior of an S4 object containing a ts object I am trying to define an S4 class that contains a ts class object, a simple example is shown in the code below. However, when I try to create a new object of this class the tsp part is ignored, see below. Am I doing something wrong, or is this just a peril of mixing S3 and S4 objects? setClass(tsExtended, representation = representation(description = character), contains = ts) [1] tsExtended new(tsExtended, ts(1:10, frequency = 2), description = My Time Series) Object of class tsExtended Time Series: Start = 1 End = 10 Frequency = 1 [1] 1 2 3 4 5 6 7 8 9 10 Slot description: [1] My Time Series # This however seems to work tmp - new(tsExtended, 1:10, description = My Time Series) tsp(tmp) - c(1, 5.5, 2) tmp Object of class tsExtended Time Series: Start = c(1, 1) End = c(5, 2) Frequency = 2 [1] 1 2 3 4 5 6 7 8 9 10 Slot description: [1] My Time Series Mark Lyman, Statistician Engineering Systems Integration, ATK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about behavior of an S4 object containing a ts object
I am trying to define an S4 class that contains a ts class object, a simple example is shown in the code below. However, when I try to create a new object of this class the tsp part is ignored, see below. Am I doing something wrong, or is this just a peril of mixing S3 and S4 objects? setClass(tsExtended, representation = representation(description = character), contains = ts) [1] tsExtended new(tsExtended, ts(1:10, frequency = 2), description = My Time Series) Object of class tsExtended Time Series: Start = 1 End = 10 Frequency = 1 [1] 1 2 3 4 5 6 7 8 9 10 Slot description: [1] My Time Series # This however seems to work tmp - new(tsExtended, 1:10, description = My Time Series) tsp(tmp) - c(1, 5.5, 2) tmp Object of class tsExtended Time Series: Start = c(1, 1) End = c(5, 2) Frequency = 2 [1] 1 2 3 4 5 6 7 8 9 10 Slot description: [1] My Time Series Mark Lyman, Statistician Engineering Systems Integration, ATK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused with default device setup
When invoking dev.new() on my Mac OS X 10.4.11, I get an X11 window instead of quartz which I feel more desirable. So I'd like to set the default device to quartz. However I'm confused because of the following: Sys.getenv(R_DEFAULT_DEVICE) R_DEFAULT_DEVICE quartz getOption(device) [1] X11 What's going on? Also is file Renviron under /Library/Frameworks/R.framework/Resources/ etc/ppc/ the one I should modify if I want to change some environment variables? But I don't see R_DEFAULT_DEVICE there. TIA, Gang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confused with default device setup
This was also posted on R-sig-mac, and I've answered it there. Please don't cross-post. On Wed, 15 Oct 2008, Gang Chen wrote: When invoking dev.new() on my Mac OS X 10.4.11, I get an X11 window instead of quartz which I feel more desirable. So I'd like to set the default device to quartz. However I'm confused because of the following: Sys.getenv(R_DEFAULT_DEVICE) R_DEFAULT_DEVICE quartz getOption(device) [1] X11 What's going on? Also is file Renviron under /Library/Frameworks/R.framework/Resources/ etc/ppc/ the one I should modify if I want to change some environment variables? But I don't see R_DEFAULT_DEVICE there. TIA, Gang -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confused about CORREP cor.LRtest
After some struggling with the data format, non-standard in BioConductor, I have gotten cor.balance in package CORREP to work. My desire was to obtain maximum-likelihood p-values from the same data object using cor.LRtest, but it appears that this function wants something different, which I can't figure out from the documentation. Briefly, my dataset consists of 36 samples from 12 conditions and I have 497 genes of interest to be correlated. The following works: M - cor.balance(stddata, m = 3, G=497) The following does not: M.p - cor.LRtest(stddata, m1 = 3, m2 = 3) Do I need to do something to stddata between example 1 and 2 or does m stand for something different in the two examples? sessionInfo follows. Thanks, Mark sessionInfo() R version 2.7.0 Under development (unstable) (2008-03-05 r44683) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] grid tools stats graphics grDevices datasets utils [8] methods base other attached packages: [1] rat2302_2.0.1Rgraphviz_1.17.13graph_1.17.17 [4] igraph_0.5 CORREP_1.5.0 e1071_1.5-17 [7] class_7.2-41 affy_1.17.8 preprocessCore_1.1.5 [10] affyio_1.7.13Biobase_1.99.1 loaded via a namespace (and not attached): [1] cluster_1.11.10 -- Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, Mobile VoiceMail (317) 204-4202 Home (no voice mail please) mwkimpelatgmaildotcom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about Tukey mult. comp. after ANCOVA
Hi, I am reposting this as I fear my original post (on Oct. 4th) got buried by all the excitement of the R 2.6 release... I had a first occasion to try multiple comparisons (of intercepts, I suppose) following a significant result in an ANCOVA. As until now I was doing this with JMP, I compared my results and the post-hoc comparisons were different between R and JMP. I chose to use an example data set from JMP because it was small, so I can show it here. It is not the best example for an ANCOVA because the factor Drug does not have a significant effect, but it will do. drug$x [1] 11 8 5 14 19 6 10 6 11 3 6 6 7 8 18 8 19 8 5 15 16 13 11 9 21 16 12 [28] 12 7 12 drug$y [1] 6 0 2 8 11 4 13 1 8 0 0 2 3 1 18 4 14 9 1 9 13 10 18 5 23 12 5 [28] 16 1 20 drug$Drug [1] a a a a a a a a a a d d d d d d d d d d f f f f f f f f f f Levels: a d f I did not manage to get TukeyHSD to work if I fitted the ANCOVA with lm, so I used aov: my.anc - aov(y~x+Drug, data=drug) summary(my.anc) Df Sum Sq Mean Sq F valuePr(F) x1 802.94 802.94 50.0393 1.639e-07 *** Drug 2 68.55 34.28 2.13610.1384 Residuals 26 417.20 16.05 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 I tried this to compare the Drugs, correcting for the effect of x. TukeyHSD(my.anc, Drug) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = y ~ x + Drug, data = drug) $Drug diff lwr upr p adj d-a 0.03131758 -4.420216 4.482851 0.9998315 f-a 3.04677613 -1.404758 7.498310 0.2239746 f-d 3.01545855 -1.436075 7.466992 0.2305187 Warning message: non-factors ignored: x in: replications(paste(~, xx), data = mf) I am not sure about the Warning, maybe it is the reason the differences shown here are different from those shown in JMP for the same analysis. Maybe TukeyHSD is not meant to be used with non- factors (i.e. not valid for ANCOVAs)? I just found the package multcomp and am not sure I understand it well yet, but its Tukey comparisons gave the same results as JMP. summary(glht(m3, linfct=mcp(Drug=Tukey))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: aov(formula = y ~ x + Drug, data = drug) Linear Hypotheses: Estimate Std. Error t value p value d - a == 00.109 1.795 0.061 0.998 f - a == 03.446 1.887 1.826 0.181 f - d == 03.337 1.854 1.800 0.189 (Adjusted p values reported) I would very much like to understand why these two Tukey tests gave different results in R. Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confused about Tukey mult. comp. after ANCOVA
Hi, I had a first occasion to try multiple comparisons (of intercepts, I suppose) following a significant result in an ANCOVA. As until now I was doing this with JMP, I compared my results and the post-hoc comparisons were different between R and JMP. I chose to use an example data set from JMP because it was small, so I can show it here. It is not the best example for an ANCOVA because the factor Drug does not have a significant effect, but it will do. drug$x [1] 11 8 5 14 19 6 10 6 11 3 6 6 7 8 18 8 19 8 5 15 16 13 11 9 21 16 12 [28] 12 7 12 drug$y [1] 6 0 2 8 11 4 13 1 8 0 0 2 3 1 18 4 14 9 1 9 13 10 18 5 23 12 5 [28] 16 1 20 drug$Drug [1] a a a a a a a a a a d d d d d d d d d d f f f f f f f f f f Levels: a d f I did not manage to get TukeyHSD to work if I fitted the ANCOVA with lm, so I used aov: my.anc - aov(y~x+Drug, data=drug) summary(my.anc) Df Sum Sq Mean Sq F valuePr(F) x1 802.94 802.94 50.0393 1.639e-07 *** Drug 2 68.55 34.28 2.13610.1384 Residuals 26 417.20 16.05 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 I tried this to compare the Drugs, correcting for the effect of x. TukeyHSD(my.anc, Drug) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = y ~ x + Drug, data = drug) $Drug diff lwr upr p adj d-a 0.03131758 -4.420216 4.482851 0.9998315 f-a 3.04677613 -1.404758 7.498310 0.2239746 f-d 3.01545855 -1.436075 7.466992 0.2305187 Warning message: non-factors ignored: x in: replications(paste(~, xx), data = mf) I am not sure about the Warning, maybe it is the reason the differences shown here are different from those shown in JMP for the same analysis. Maybe TukeyHSD is not meant to be used with non- factors (i.e. not valid for ANCOVAs)? I just found the package multcomp and am not sure I understand it well yet, but its Tukey comparisons gave the same results as JMP. summary(glht(m3, linfct=mcp(Drug=Tukey))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: aov(formula = y ~ x + Drug, data = drug) Linear Hypotheses: Estimate Std. Error t value p value d - a == 00.109 1.795 0.061 0.998 f - a == 03.446 1.887 1.826 0.181 f - d == 03.337 1.854 1.800 0.189 (Adjusted p values reported) I would very much like to understand why these two Tukey tests gave different results in R. Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.