Re: [R] multivariate problem
Would some type of multivariate SPC be useful? Potentially useful options might include those based on SPC using PCA, or perhaps Hotelling's T2. Maybe you would find something useful in a package such as rrcov? http://cran.r-project.org/web/packages/rrcov/vignettes/rrcov.pdf R/S Cliff On Sun, Jan 8, 2012 at 11:16 AM, David Winsemius dwinsem...@comcast.net wrote: On Jan 8, 2012, at 3:01 AM, Iasonas Lamprianou wrote: Dear all, I am not sure if this is the right place to ask this question, but I will have a go. Please redirect me to a different place if this is not the right one! I have a (relatively) simple problem which causes me some frustration because I cannot find the solution. I measure ten variables (var1 to var10) every day, they are all continuous (linear) and most of them are correlated. Some days, for any reason, the relationship between these variables may change. They are still correlated, but their correlation may change slightly but practically this is important. Or, one of the variables may increase its value significantly suddenly and keep this high value for a few days and then come back to the normal level. I am using R. Is there any function I can use to help me identify these strange days when the relationship between these variables changes? For example, if DayX is such a strange day, factor analyzing the data before DayX and after DayX separately would give me different factors (princial components). But how can I identify such a daym without trial and error? The zoo package has `rollapply`. You would of course be required to be much more specific in defining your problem than you have been so far. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using plot with time series object - axes = FALSE option does not appear to work
Dear R-help, I am told by Professor Ripley that my question (quote) wastes the time of (and has insulted on-list) the R developers by falsely claiming there are problems with their code. I am writing, as he instructed, to publicly apologize to the R developers for any slight that my question might have generated. When posing my question, I genuinely thought that by invoking plot that it was a Base R graphics package that I was using, and not that offered by Mr. Ryan. As I am told that I owe an apology, I would like to do as I was instructed and publicly apologize on the list to the R developers, but not for any malicious intent, but rather for my lack of expertise and any unintentional insult that resulted. (My apologies also to Mr. Ryan for my ignorance.) Signed bruised, but maybe a little less ignorant about R. Cliff Long gnolff...@gmail.com On Mon, Jan 3, 2011 at 12:03 AM, Clifford Long gnolff...@gmail.com wrote: Dear R-help, I am attempting to plot data using standard R plot utilities. The data was retrieved from FRED (St. Louis Federal Reserve) using the package quantmod. My question is NOT about quantmod. While I retrieve data using quantmod, I am not using its charting utility. I have been having success using the standard R plot utilities to this point with this type of data. Eventually I want to put two series on the same plot but with the y-axis for one series on the right side, and also inverted (min at the top, max at the bottom). I believe that I see how to do this by using par(new = TRUE) with a second plot statement with axes = FALSE, followed by the command axis(side = 4, ylim = c(max(seriesname), min(seriesname)). Here is what I believe should be a smaller reproducible example of my issue: #- library(quantmod) getSymbols('PCECTPI', src='FRED') is.xts(PCECTPI) # check the type of object - response is 'TRUE' plot(PCECTPI) # This works fine. plot(PCECTPI, axes = FALSE) # This works in that it gives me a plot, but I get the axes regardless of the use of axes = FALSE. #- I did find that using par(yaxt = n) seems to work to suppress the y-axis. But it seems to me that the axes = FALSE command should also work, and I believe that it would be easier to use in the larger context of my goal. I have spent time with the R help pages and Nabble searches of the R help archives but I still seem to be missing something. Does the axes = FALSE option not work when using plot with this type of data object? Is there some other fundamental thing that I have overlooked? Or should this work? My apologies if the answer is obvious and I've just missed it. Thank you in advance for any help that can be provided. Cliff Long gnolff...@gmail.com My system: HP Pavilion Windows 7 The computer/OS is 64-bit. I am running the precompiled 32-bit version of R (per sessionInfo). Thus far, everything seems to have been working as expected. sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] quantmod_0.3-15 TTR_0.20-2 xts_0.7-5 zoo_1.6-4 [5] Defaults_1.1-1 loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-13 tools_2.12.1 Sys.getlocale() [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using plot with time series object - axes = FALSE option does not appear to work
Dear R-help, I am attempting to plot data using standard R plot utilities. The data was retrieved from FRED (St. Louis Federal Reserve) using the package quantmod. My question is NOT about quantmod. While I retrieve data using quantmod, I am not using its charting utility. I have been having success using the standard R plot utilities to this point with this type of data. Eventually I want to put two series on the same plot but with the y-axis for one series on the right side, and also inverted (min at the top, max at the bottom). I believe that I see how to do this by using par(new = TRUE) with a second plot statement with axes = FALSE, followed by the command axis(side = 4, ylim = c(max(seriesname), min(seriesname)). Here is what I believe should be a smaller reproducible example of my issue: #- library(quantmod) getSymbols('PCECTPI', src='FRED') is.xts(PCECTPI) # check the type of object - response is 'TRUE' plot(PCECTPI) # This works fine. plot(PCECTPI, axes = FALSE) # This works in that it gives me a plot, but I get the axes regardless of the use of axes = FALSE. #- I did find that using par(yaxt = n) seems to work to suppress the y-axis. But it seems to me that the axes = FALSE command should also work, and I believe that it would be easier to use in the larger context of my goal. I have spent time with the R help pages and Nabble searches of the R help archives but I still seem to be missing something. Does the axes = FALSE option not work when using plot with this type of data object? Is there some other fundamental thing that I have overlooked? Or should this work? My apologies if the answer is obvious and I've just missed it. Thank you in advance for any help that can be provided. Cliff Long gnolff...@gmail.com My system: HP Pavilion Windows 7 The computer/OS is 64-bit. I am running the precompiled 32-bit version of R (per sessionInfo). Thus far, everything seems to have been working as expected. sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] quantmod_0.3-15 TTR_0.20-2 xts_0.7-5 zoo_1.6-4 [5] Defaults_1.1-1 loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-13 tools_2.12.1 Sys.getlocale() [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Homogeneity of regression slopes
Hi Thomas, Thanks for the additional information. Just wondering, and hoping to learn ... would any lack of homogeneity of variance (which is what I believe you mean by different stddev estimates) be found when performing standard regression diagnostics, such as residual plots, Levene's test (or equivalent), etc.? If so, then would a WLS routine or some type of variance stabilizing transformation be useful? Again, hoping to learn. I'll check out the gls() routine in the nlme package, as you mentioned. Thanks. Cliff On Mon, Sep 13, 2010 at 10:02 PM, Thomas Stewart tgstew...@gmail.comwrote: Allow me to add to Michael's and Clifford's responses. If you fit the same regression model for each group, then you are also fitting a standard deviation parameter for each model. The solution proposed by Michael and Clifford is a good one, but the solution assumes that the standard deviation parameter is the same for all three models. You may want to consider the degree by which the standard deviation estimates differ for the three separate models. If they differ wildly, the method described by Michael and Clifford may not be the best. Rather, you may want to consider gls() in the nlme package to explicitly allow the variance parameters to vary. -tgs On Mon, Sep 13, 2010 at 4:52 PM, Doug Adams f...@gmx.com wrote: Hello, We've got a dataset with several variables, one of which we're using to split the data into 3 smaller subsets. (as the variable takes 1 of 3 possible values). There are several more variables too, many of which we're using to fit regression models using lm. So I have 3 models fitted (one for each subset of course), each having slope estimates for the predictor variables. What we want to find out, though, is whether or not the overall slopes for the 3 regression lines are significantly different from each other. Is there a way, in R, to calculate the overall slope of each line, and test whether there's homogeneity of regression slopes? (Am I using that phrase in the right context -- comparing the slopes of more than one regression line rather than the slopes of the predictors within the same fit.) I hope that makes sense. We really wanted to see if the predicted values at the ends of the 3 regression lines are significantly different... But I'm not sure how to do the Johnson-Neyman procedure in R, so I think testing for slope differences will suffice! Thanks to any who may be able to help! Doug Adams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Homogeneity of regression slopes
If you'll allow me to throw in two cents ... Like Michael said, the dummy variable route is the way to go, but I believe that the coefficients on the dummy variables test for equal intercepts. For equality of slopes, do we need the interaction between the dummy variable and the explanatory variable whose slope (coefficient) is of interest? I'll add some detail below. For only two groups, we could use a single 2-level dummy variable D D = 0 is the reference level (group) D = 1 is the other level (group) Equality of intercepts y = b0 + b1*x + b2*D If D = 0, then y = b0 + b1*x If D = 1, then y = b0 + b1*x + b2 .. group like terms: y = (b0 + b2) + b1*x If coefficient b2 = 0, then we might fail to reject the null hypothesis that the intercepts are equal If coefficient b2 0, then we would reject the null hypothesis that the intercepts are equal Equality of slopes model y = b0 + b1*x + b2*D + b3*x*D (we added the interaction between x and D) If D = 0, then y = b0 + b1*x If D = 1, then y = b0 + b1*x + b2 + b3*x .. group like terms: y = (b0 + b2) + (b1 + b3)*x If coefficient b3 = 0, then we might fail to reject the null hypothesis that the slopes are equal If coefficient b3 0, then we would reject the null hypothesis that the slopes are equal For a model with three groups, assuming that lm / glm / etc. would really do this for you, the explicit dummy variable coding might look like: D1 D2 group 1 0 0(reference level ... can usually choose) group 2 1 0 group 3 0 1 I believe that this is called a sigma-restricted model (??), as opposed to an overparameterized model where three groups would have three dummy variables. You can probably find this info in most books on basic regression. This might be overly simplistic, and I'll happily stand corrected if I've made any mistakes. Otherwise, I hope that this helps. Cliff On Mon, Sep 13, 2010 at 7:12 PM, Michael Bedward michael.bedw...@gmail.comwrote: Hello Doug, Perhaps it would just be easier to keep your data together and have a single regression with a term for the grouping variable (a factor with 3 levels). If the groups give identical results the coefficients for the two non-reference grouping variable levels will include 0 in their confidence interval. Michael On 14 September 2010 06:52, Doug Adams f...@gmx.com wrote: Hello, We've got a dataset with several variables, one of which we're using to split the data into 3 smaller subsets. (as the variable takes 1 of 3 possible values). There are several more variables too, many of which we're using to fit regression models using lm. So I have 3 models fitted (one for each subset of course), each having slope estimates for the predictor variables. What we want to find out, though, is whether or not the overall slopes for the 3 regression lines are significantly different from each other. Is there a way, in R, to calculate the overall slope of each line, and test whether there's homogeneity of regression slopes? (Am I using that phrase in the right context -- comparing the slopes of more than one regression line rather than the slopes of the predictors within the same fit.) I hope that makes sense. We really wanted to see if the predicted values at the ends of the 3 regression lines are significantly different... But I'm not sure how to do the Johnson-Neyman procedure in R, so I think testing for slope differences will suffice! Thanks to any who may be able to help! Doug Adams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulating points from GLM corresponding to new x-values
Would the predict routine (using 'newdata') do what you need? Cliff Long Hollister Incorporated On Wed, Aug 12, 2009 at 4:33 AM, Jacob Nabe-Nielsenjacobn...@me.com wrote: Dear List, Does anyone know how to simulate data from a GLM object correponding to values of the independent (x) variable that do not occur in the original dataset? I have tried using simulate(), but it generates a new value of the dependent variable corresponding to each of the original x-values, which is not what I need. Ideally I whould like to simulate new values for GLM objects both with family=gaussian and with family=binomial. Thanks in advance, Jacob Jacob Nabe-Nielsen, PhD, MSc Scientist -- Section for Climate Effects and System Modelling Department of Arctic Environment National Enviornmental Research Institute Aarhus University Frederiksborgvej 399, Postbox 358 4000 Roskilde, Denmark email: n...@dmu.dk fax: +45 4630 1914 phone: +45 4630 1944 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulating points from GLM corresponding to new x-values
Hi Jacob, At the risk of embarrassing myself, I gave it a shot. I'll throw this out on the list, if for no other reason than perhaps someone with higher wattage than myself might tear it apart and give you something both useful and perhaps elegant (from an R coding standpoint). (see the following ... it just ran for me, so I hope it will for you, too) If this isn't what you need, I'll shut up and watch and learn from the others. Regards, Cliff I tried to put together something that might work as an example ... based on a simple linear regression model, but using the GLM routine. Once the model was created, I used 'predict' based on the model outcome and the original x values. I then used 'predict' based on the model outcome and new x values, along with a function for simulation of the distribution at the new x values. At the #-- # Create sample data set to use with GLM # (assume first order linear model for now) #-- b0 = 10 b1 = 0.3 x = sort(rep(seq(1,11, by=2), 10)) fn.y = function(x1){y1 = b0 + b1*x1 + rnorm(n=1, mean=0, sd=1)} y = sapply(x, fn.y) xydata = data.frame(cbind(x, y)) model1 = glm(y ~ x, family = gaussian, data = xydata) #-- # Generate new x values # run new x values through 'predict' #-- newx = data.frame(xnew = sort(rep(seq(2,10, by=2), 12))) y.pred = predict(model1, newx, se.fit = TRUE) #-- # Generate simulated values based on new x values # and function based on outcome of 'predict' routine #-- fn.pred = function(fit, sefit){rnorm(n=1, mean=fit, sd=sefit*sqrt(60))} pred.sim = sapply(y.pred$fit, fn.pred, y.pred$se.fit) #-- # Generate simulated values based on orig x values # using 'simulate' routine #-- y.sim = simulate(model1, nsim = 1) #-- # Plot original x, y values # then add simulated y values from 'simulate' based on orig x values # and the add simulated values from 'predict' and function based on new x values #-- plot(x, y) lines(x, y.sim$sim_1, col='red', type='p') lines(newx[,1], pred.sim, col='darkblue', type='p') #-- # END #-- On Wed, Aug 12, 2009 at 2:51 PM, Jacob Nabe-Nielsenjacobn...@me.com wrote: Hi Cliff -- thanks for the suggestion. I tried extracting the predicted mean and standard error using predict(). Afterwards I simulated the dependent variable using rnorm(), with mean and standard deviation taken from the predict() function (sd = sqrt(n)*se). The points obtained this way were scattered far too much (compared to points obtained with simulate()) -- I am not quite sure why. Unfortunately the documentation of the simulate() function does not provide much information about how it is implemented, which makes it difficult to judge which method is best (predict() or simulate(), and it is also unclear whether simulate() can be applied to glms (with family=gaussian or binomial). Any suggestions for how to proceed? Jacob On 12 Aug 2009, at 13:11, Clifford Long wrote: Would the predict routine (using 'newdata') do what you need? Cliff Long Hollister Incorporated On Wed, Aug 12, 2009 at 4:33 AM, Jacob Nabe-Nielsenjacobn...@me.com wrote: Dear List, Does anyone know how to simulate data from a GLM object correponding to values of the independent (x) variable that do not occur in the original dataset? I have tried using simulate(), but it generates a new value of the dependent variable corresponding to each of the original x-values, which is not what I need. Ideally I whould like to simulate new values for GLM objects both with family=gaussian and with family=binomial. Thanks in advance, Jacob Jacob Nabe-Nielsen, PhD, MSc Scientist -- Section for Climate Effects and System Modelling Department of Arctic Environment National Enviornmental Research Institute Aarhus University Frederiksborgvej 399, Postbox 358 4000 Roskilde, Denmark email: n...@dmu.dk fax: +45 4630 1914 phone: +45 4630 1944 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
[R] ROC curve using epicalc (after logistic regression) (re-sent)
Dear R-help, I am resending as I believe I screwed up the e-mail address to R-help earlier. Sorry for my lack of attention to detail, and for any inconvenience. I have also sent the question to the package maintainer, as suggested in the posting guide. Regards, Cliff -- Forwarded message -- From: Clifford Long gnolff...@gmail.com Date: Sun, Jul 26, 2009 at 8:46 PM Subject: Fwd: ROC curve using epicalc (after logistic regression) To: cvira...@medicine.psu.ac.th Dear Virasakdi Chongsuvivatwong, After sending the message below to the R-help mailing list, it occurred to me that I probably should also have sent a copy to you, per R posting guidance. I would be interested in any thoughts or suggestions that you might have regarding my difficulty using the ROCR routine in the epicalc package. (I've used this before, and find it to be a very helpful package ... thanks.) Is my issue related to the way the data is structured for the glm routine - meaning not with individual cases, but instead by counts (per DOE treatment) of pass, fail, and total? Or perhaps I've made another error? I'll understand if you don't have the time to look this over. In case you do, any direction/guidance will be appreciated. Thank you for your time, and for this excellent package. Regards, Cliff Long -- Forwarded message -- From: Clifford Long gnolff...@gmail.com Date: Sun, Jul 26, 2009 at 3:52 PM Subject: ROC curve using epicalc (after logistic regression) To: R-help@r-project.org Dear R-help list, I'm attempting to use the ROC routine from the epicalc package after performing a logistic regression analysis. My code is included after the sessionInfo() result. The datafile (GasketMelt1.csv) is attached. I updated both R and the epicalc packages and tried again before sending this request. sessionInfo result: R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] caret_4.19 lattice_0.17-25 epicalc_2.9.1.2 survival_2.35-4 [5] foreign_0.8-36 loaded via a namespace (and not attached): [1] grid_2.9.1 tools_2.9.1 Header information from package 'epicalc': Package: epicalc Version: 2.9.1.2 Date: 2009-07-14 My code ... # # Logistic Regression (the model result is as expected) # dfile = 'GasketMelt1.csv' gmelt.df = read.csv(dfile, header = TRUE, as.is = TRUE) names(gmelt.df) gmelt.df$p = gmelt.df$Pass / gmelt.df$Total gmelt.glm = glm(p ~ Time + Temperature + Depth + Time*Temperature + Time*Depth + Temperature*Depth, family = binomial(link = logit), data=gmelt.df, weight=Total) summary(gmelt.glm) # # ROC # library(epicalc) lroc(gmelt.glm, graph = TRUE, line.col = red) The error message: lroc(gmelt.glm, graph = TRUE, line.col = red) Error in dimnames(x) - dn : length of 'dimnames' [2] not equal to array extent Have I overlooked something? Many thanks to anyone who might have a suggestion. Cliff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROC curve using epicalc (after logistic regression)
Dear R-help list, I'm attempting to use the ROC routine from the epicalc package after performing a logistic regression analysis. My code is included after the sessionInfo() result. The datafile (GasketMelt1.csv) is attached. I updated both R and the epicalc packages and tried again before sending this request. sessionInfo result: R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] caret_4.19 lattice_0.17-25 epicalc_2.9.1.2 survival_2.35-4 [5] foreign_0.8-36 loaded via a namespace (and not attached): [1] grid_2.9.1 tools_2.9.1 Header information from package 'epicalc': Package:epicalc Version:2.9.1.2 Date: 2009-07-14 My code ... # # Logistic Regression (the model result is as expected) # dfile = 'GasketMelt1.csv' gmelt.df = read.csv(dfile, header = TRUE, as.is = TRUE) names(gmelt.df) gmelt.df$p = gmelt.df$Pass / gmelt.df$Total gmelt.glm = glm(p ~ Time + Temperature + Depth + Time*Temperature + Time*Depth + Temperature*Depth, family = binomial(link = logit), data=gmelt.df, weight=Total) summary(gmelt.glm) # # ROC # library(epicalc) lroc(gmelt.glm, graph = TRUE, line.col = red) The error message: lroc(gmelt.glm, graph = TRUE, line.col = red) Error in dimnames(x) - dn : length of 'dimnames' [2] not equal to array extent Have I overlooked something? Many thanks to anyone who might have a suggestion. Cliff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: question about using _apply and/or aggregate functions
Resending, as am not sure about the original To: address. Sorry for any redundancy. - Cliff -- Forwarded message -- From: Clifford Long gnolff...@gmail.com Date: Mon, Jun 22, 2009 at 11:04 AM Subject: question about using _apply and/or aggregate functions To: r-h...@lists.r-project.org Hi R-list, I'll apologize in advance for (1) the wordiness of my note (not sure how to avoid it) and (2) any deficiencies on my part that lead to my difficulties. I have an application with several stages that is meant to simulate and explore different scenarios with respect to product sales (in units sold per month). My session info is at the bottom of this note. The steps include (1) an initial linear regression model, (2) building an ARIMA model, (3) creating 12 months of simulated 'real' data - for various values of projected changes in slope from the linear regression - from the ARIMA model using arima.sim with subsequent undifferencing and appending to historical data, (3) one-step-ahead forecasting for 12 months using the 'fixed' arima model and simulated data, (4) taking the residuals from the forecasting (simulated minus forecast for each of the 12 months) and applying QC charting, and (5) counting the number (if any) of runs-rules violations detected. The simulation-aspect calculates 100 simulations for each of 50 values of slope. All of this seems to work fine (although I'm sure that the coding could be improved through functions, vectorization, etc. ... ). However, the problem I'm having is at the end where I had hoped that things would be easier. I want to summarize and graph the probability of detecting a runs-rule violation vs. the amount of the shift in slope (of logunit sales). The output data array passed from the qcc section at the end includes: - the adjustment made to the slope (a multiplier) - the actual value of the slope - the iteration number of the simulation loop (within each value of slope) - the count of QC charting limits violations - the count of QC charting runs rules violations My code is in the attached file (generic_code.R) and my initial sales data needed to prime the simulation is in the other attached file (generic_data.csv). The relevant section of my code is at the bottom of the .R file after the end of the outer loop. I've tried to provide meaningful comments. I've read the help files for _apply, aggregate, arrays and data types, and have also consulted with several texts (Maindonald and Braun; Spector; Venebles and Ripley for S-plus). Somehow I still don't get it. My attempts usually result in a message like the following: agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean) Error in FUN(X[[1L]], ...) : arguments must have same length But when I check the length of the arguments, they appear to match. (??) length(simarray.part[,3]) [1] 5000 length(simarray.part[,4]) [1] 5000 length(list4) [1] 5000 I would have rather passed along a subset of the simulation/loop output dataset, but was unsure how to save it to preserve whatever properties that I may have missed that are causing my difficulties. If anyone still wants to help at this point, I believe that you'll need to load the .csv data and run the simulation (maybe reducing the number of iterations). Many thanks to anyone who can shed some light on my difficulties (whether self-induced or otherwise). Cliff I'm using a pre-compiled binary version of R for Windows. Session info: sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] qcc_1.3 forecast_1.24 tseries_0.10-18 zoo_1.5-5 [5] quadprog_1.4-11 loaded via a namespace (and not attached): [1] grid_2.9.0 lattice_0.17-22 Sys.getlocale() [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about using _apply and/or aggregate functions
Hi R-list, I'll apologize in advance for (1) the wordiness of my note (not sure how to avoid it) and (2) any deficiencies on my part that lead to my difficulties. I have an application with several stages that is meant to simulate and explore different scenarios with respect to product sales (in units sold per month). My session info is at the bottom of this note. The steps include (1) an initial linear regression model, (2) building an ARIMA model, (3) creating 12 months of simulated 'real' data - for various values of projected changes in slope from the linear regression - from the ARIMA model using arima.sim with subsequent undifferencing and appending to historical data, (3) one-step-ahead forecasting for 12 months using the 'fixed' arima model and simulated data, (4) taking the residuals from the forecasting (simulated minus forecast for each of the 12 months) and applying QC charting, and (5) counting the number (if any) of runs-rules violations detected. The simulation-aspect calculates 100 simulations for each of 50 values of slope. All of this seems to work fine (although I'm sure that the coding could be improved through functions, vectorization, etc. ... ). However, the problem I'm having is at the end where I had hoped that things would be easier. I want to summarize and graph the probability of detecting a runs-rule violation vs. the amount of the shift in slope (of logunit sales). The output data array passed from the qcc section at the end includes: - the adjustment made to the slope (a multiplier) - the actual value of the slope - the iteration number of the simulation loop (within each value of slope) - the count of QC charting limits violations - the count of QC charting runs rules violations My code is in the attached file (generic_code.R) and my initial sales data needed to prime the simulation is in the other attached file (generic_data.csv). The relevant section of my code is at the bottom of the .R file after the end of the outer loop. I've tried to provide meaningful comments. I've read the help files for _apply, aggregate, arrays and data types, and have also consulted with several texts (Maindonald and Braun; Spector; Venebles and Ripley for S-plus). Somehow I still don't get it. My attempts usually result in a message like the following: agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean) Error in FUN(X[[1L]], ...) : arguments must have same length But when I check the length of the arguments, they appear to match. (??) length(simarray.part[,3]) [1] 5000 length(simarray.part[,4]) [1] 5000 length(list4) [1] 5000 I would have rather passed along a subset of the simulation/loop output dataset, but was unsure how to save it to preserve whatever properties that I may have missed that are causing my difficulties. If anyone still wants to help at this point, I believe that you'll need to load the .csv data and run the simulation (maybe reducing the number of iterations). Many thanks to anyone who can shed some light on my difficulties (whether self-induced or otherwise). Cliff I'm using a pre-compiled binary version of R for Windows. Session info: sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] qcc_1.3 forecast_1.24 tseries_0.10-18 zoo_1.5-5 [5] quadprog_1.4-11 loaded via a namespace (and not attached): [1] grid_2.9.0 lattice_0.17-22 Sys.getlocale() [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about using _apply and/or aggregate functions
Hi David, I appreciate the advice. I had coerced 'list4' to as.list, but forgot to specify list=() in the call to aggregate. I made the correction, and now get the following: slope.mult = simarray[,1] adj.slope.value = simarray[,2] adj.slope.level = simarray[,2] qc.run.violation = simarray[,5] simarray.part = cbind(slope.mult, adj.slope.value, qc.run.violation, adj.slope.level) list4 = as.list(simarray.part[,4]) agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean) Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? ... I'm not sure what this means that I've done wrong. I did check 'list4' using is.list, and get TRUE back as an answer, so feel that my mistake is some other fundamental aspect of R that I'm failing to grasp. To your note on 'tapply' ... I did try this as well (actually, tried it first) with no initial success. On your recommendation, I gave tapply another go, and get something recognizable: vtt = tapply(simarray.part[,3], simarray.part[,2], mean) dim(vtt) [1] 50 length(vtt) [1] 50 vtt[1:5] 0.003132 0.006264 0.009396 0.012528 0.01566 0.29 0.24 0.23 0.16 0.22 vtt[1] 0.003132 0.29 vtt[[1]][1] [1] 0.29 I see that the output stored in vtt has one dimension with length=50. But each place in vtt appears to hold two values. I'll admit that I'm not sure how to access/reference the entirety of all 50 values = 0.29 0.24 0.23 0.16 0.22 (and so on). I don't appear to be able to access/reference only what appears to be an embedded index = 0.003132 0.006264 0.009396 etc. What am I missing? Is there a reference that I need to re-read? I would like to be able to plot one against the other. Thanks again for taking the time outside of your day job for your earlier reply! Cliff On Mon, Jun 22, 2009 at 11:28 AM, David Winsemiusdwinsem...@comcast.net wrote: On Jun 22, 2009, at 12:04 PM, Clifford Long wrote: Hi R-list, I'll apologize in advance for (1) the wordiness of my note (not sure how to avoid it) and (2) any deficiencies on my part that lead to my difficulties. I have an application with several stages that is meant to simulate and explore different scenarios with respect to product sales (in units sold per month). My session info is at the bottom of this note. The steps include (1) an initial linear regression model, (2) building an ARIMA model, (3) creating 12 months of simulated 'real' data - for various values of projected changes in slope from the linear regression - from the ARIMA model using arima.sim with subsequent undifferencing and appending to historical data, (3) one-step-ahead forecasting for 12 months using the 'fixed' arima model and simulated data, (4) taking the residuals from the forecasting (simulated minus forecast for each of the 12 months) and applying QC charting, and (5) counting the number (if any) of runs-rules violations detected. The simulation-aspect calculates 100 simulations for each of 50 values of slope. All of this seems to work fine (although I'm sure that the coding could be improved through functions, vectorization, etc. ... ). However, the problem I'm having is at the end where I had hoped that things would be easier. I want to summarize and graph the probability of detecting a runs-rule violation vs. the amount of the shift in slope (of logunit sales). The output data array passed from the qcc section at the end includes: - the adjustment made to the slope (a multiplier) - the actual value of the slope - the iteration number of the simulation loop (within each value of slope) - the count of QC charting limits violations - the count of QC charting runs rules violations My code is in the attached file (generic_code.R) and my initial sales data needed to prime the simulation is in the other attached file (generic_data.csv). The relevant section of my code is at the bottom of the .R file after the end of the outer loop. I've tried to provide meaningful comments. I've read the help files for _apply, aggregate, arrays and data types, and have also consulted with several texts (Maindonald and Braun; Spector; Venebles and Ripley for S-plus). Somehow I still don't get it. My attempts usually result in a message like the following: agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean) Error in FUN(X[[1L]], ...) : arguments must have same length I cannot comment on the overall strategy, but wondered if this minor mod to the code might succeed; agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean) My personal experience with aggregate() is not positive. I generally end up turning to tapply() (which is at the heart of aggregate() anyway) probably because I forget to wrap the second argument in a list. Slow learner, I guess. But when I check the length of the arguments, they appear to match. (??) length
Re: [R] question about using _apply and/or aggregate functions
David, Once again, many thanks for your very useful and timely feedback, and for your patience with my learning curve. Sincerely, Cliff On Mon, Jun 22, 2009 at 7:11 PM, David Winsemiusdwinsem...@comcast.net wrote: On Jun 22, 2009, at 7:55 PM, David Winsemius wrote: On Jun 22, 2009, at 6:16 PM, Clifford Long wrote: Hi David, I appreciate the advice. I had coerced 'list4' to as.list, but forgot to specify list=() in the call to aggregate. I made the correction, and now get the following: slope.mult = simarray[,1] adj.slope.value = simarray[,2] adj.slope.level = simarray[,2] qc.run.violation = simarray[,5] simarray.part = cbind(slope.mult, adj.slope.value, qc.run.violation, adj.slope.level) list4 = as.list(simarray.part[,4]) agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean) Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? ... I'm not sure what this means that I've done wrong. I did check 'list4' using is.list, and get TRUE back as an answer, so feel that my mistake is some other fundamental aspect of R that I'm failing to grasp. To your note on 'tapply' ... I did try this as well (actually, tried it first) with no initial success. On your recommendation, I gave tapply another go, and get something recognizable: vtt = tapply(simarray.part[,3], simarray.part[,2], mean) snipped other stuff... I would like to be able to plot one against the other. plot(names(vtt), vtt) Or perhaps: plot(as.numeric(names(vtt)), vtt) -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p-values for ARIMA coefficients
Hi Myriam, I'll take a stab at it, but can't offer elegance in the solution such as the more experienced R folks might deliver. I believe that the ARIMA function provides both point estimates and their standard errors for the coefficients. You can use these as you might a mean and standard error in a one-sample t- or z-test with the null hypothesis being that the coefficient value is 'zero'. If the interval estimate of the coefficient doesn't span 'zero', then you reject the null hypothesis at whatever level of Type 1 error you chose. The R set of distribution functions might be useful for this. Having offered that as a possible interim solution, you'd be well served to see what other advice more experienced R users might offer. Regards, Cliff On Mon, Jun 22, 2009 at 6:38 PM, m.gha...@yahoo.fr wrote: Hi, I'm a beginner using R and I'm modeling a time series with ARIMA. I'm looking for a way to determine the p-values of the coefficients of my model. Does ARIMA function return these values? or is there a way to determine them easily? Thanks for your answer Myriam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.