Re: [R] if value is in vector, perform this function
I think the command you want is: if(t %in% feed_days) C_A - 1.5 else C_A - 0 Do not confuse `%in%` (which is essentially are the left-hand values in the right-hand vector) with in of the `for` loop. By the way, if(t == TRUE) is redundant -- better is: if(t) Pat On 02/03/2013 23:57, Louise Stevenson wrote: Hi, I'm trying to set up R to run a simulation of two populations in which every 3.5 days, the initial value of one of the populations is reset to 1.5. I'm simulation an experiment we did in which we fed Daphnia populations twice a week with algae, so I want the initial value of the algal population to reset to 1.5 twice a week to simulate that feeding. I've use for loops and if/else loops before but I can't figure out how to syntax if t is in this vector of possible t values, do this command, else, do this command if that makes sense. Here's what I have (and it doesn't work): params = c(1, 0.15, 0.164, 1) init = c(1.5, 0.05) t=seq(1,60, by=0.5) #all time values, experiment ran for 60 days #feeding sequence - every 3.5 days feed_days = seq(1,60,by=3.5) Daphnia - function(t,x,params){ C_D = x[2]; C_A = 0; for(t %in% feed_days){ if t == TRUE { C_A = 1.5 } else{ C_A = 0 }} gamma = params[1]; m_D = params[2]; K_q = params[3]; q_max = params[4]; M_D = m_D * C_D I_A = (C_D * q_max * C_A) / (K_q + C_A) r_D = gamma * I_A return( list(c( - I_A, r_D - M_D ))) } library(deSolve) results - ode(init, t, Daphnia, params, method = lsoda) Let me know if there's any other info that would be helpful and thanks very much for your help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if value is in vector, perform this function
On 03-03-2013, at 00:57, Louise Stevenson louise.steven...@lifesci.ucsb.edu wrote: Hi, I'm trying to set up R to run a simulation of two populations in which every 3.5 days, the initial value of one of the populations is reset to 1.5. I'm simulation an experiment we did in which we fed Daphnia populations twice a week with algae, so I want the initial value of the algal population to reset to 1.5 twice a week to simulate that feeding. I've use for loops and if/else loops before but I can't figure out how to syntax if t is in this vector of possible t values, do this command, else, do this command if that makes sense. Here's what I have (and it doesn't work): params = c(1, 0.15, 0.164, 1) init = c(1.5, 0.05) t=seq(1,60, by=0.5) #all time values, experiment ran for 60 days #feeding sequence - every 3.5 days feed_days = seq(1,60,by=3.5) Daphnia - function(t,x,params){ C_D = x[2]; C_A = 0; for(t %in% feed_days){ if t == TRUE { C_A = 1.5 } else{ C_A = 0 }} gamma = params[1]; m_D = params[2]; K_q = params[3]; q_max = params[4]; M_D = m_D * C_D I_A = (C_D * q_max * C_A) / (K_q + C_A) r_D = gamma * I_A return( list(c( - I_A, r_D - M_D ))) } library(deSolve) results - ode(init, t, Daphnia, params, method = lsoda) You have been given a correction for expression for (t %in% feed_days). But even with that correction things will not do as you seem to want. The argument t of function Daphnia is the integration time the ode solver is passing and almost certainly is NOT an element of the vector t defined at the start of your script. That t is the the time sequence for which output is wanted (see ode help); it is what is put into the output of ode. There is no reason to assume that the Daphnia argument t is an element of feed_days. You can easily check this by inserting a print(t) in Daphnia. So C_A will be 0 most of the time. It would certainly help if you named the elements of the init vector and the return list of Daphnia. In Daphnia x[2] is C_D. But what is x[1] (C_A?)? I think you will have to look at deSolve events but I'm not sure if that is possible or required/desired with your model. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if value is in vector, perform this function
I forgot to say: Also do not depend on equality in this situation. You want to test equality with a tolerance. See Circle 1 of 'The R Inferno': http://www.burns-stat.com/documents/books/the-r-inferno/ I also see that 't' is a vector unlike what I was thinking before, thus you want to use 'ifelse': C_A - ifelse(t %in% feed_days, 1.5, 0) except that still leaves out the tolerance. If you are always only going to go by half-days, then the following should work: C_A - ifelse( round(2*t) %in% round(2 * feed_days), 1.5, 0) Pat On 02/03/2013 23:57, Louise Stevenson wrote: Hi, I'm trying to set up R to run a simulation of two populations in which every 3.5 days, the initial value of one of the populations is reset to 1.5. I'm simulation an experiment we did in which we fed Daphnia populations twice a week with algae, so I want the initial value of the algal population to reset to 1.5 twice a week to simulate that feeding. I've use for loops and if/else loops before but I can't figure out how to syntax if t is in this vector of possible t values, do this command, else, do this command if that makes sense. Here's what I have (and it doesn't work): params = c(1, 0.15, 0.164, 1) init = c(1.5, 0.05) t=seq(1,60, by=0.5) #all time values, experiment ran for 60 days #feeding sequence - every 3.5 days feed_days = seq(1,60,by=3.5) Daphnia - function(t,x,params){ C_D = x[2]; C_A = 0; for(t %in% feed_days){ if t == TRUE { C_A = 1.5 } else{ C_A = 0 }} gamma = params[1]; m_D = params[2]; K_q = params[3]; q_max = params[4]; M_D = m_D * C_D I_A = (C_D * q_max * C_A) / (K_q + C_A) r_D = gamma * I_A return( list(c( - I_A, r_D - M_D ))) } library(deSolve) results - ode(init, t, Daphnia, params, method = lsoda) Let me know if there's any other info that would be helpful and thanks very much for your help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters
Dear Nicole, First of all, my sincere gratitude goes to your kind reply. As I told to Mr. Gunter, this is a part of my research and differs from homework. However, I am going to clarify the problem. Suppose we have received an observation from a Poisson distr. i.e. Y_1~Pois(Lam_1), where Lam_1~Gamma(alpha_1, beta_1). Now, what's the empirical Bayes (EB) estimation for alpha_1 and beta_1? Let Y_2~Pois(Lam_2) and Lam_2~Gamma(alpha_2, beta_2). Again how can we calculate EB for alpha_2 and beta_2? In fact, I read the relevant paper by Robbins at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC350425/ but it gave 0 for Y_1. And for the Var(Y) E(Y), it generates negative value for positive value of alpha/beta!! Any idea? Kind regards, -Original Message- From: Nicole Ford [mailto:nicole.f...@me.com] Sent: Sunday, March 03, 2013 4:09 AM To: Bert Gunter Cc: Boroumideh-Ali Akbar; r-help@r-project.org Subject: Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters also, kruschke at indiana has some info on this, both online and youtube. (if homework.) if not, more infor will be helpful. ~n On Feb 25, 2013, at 9:41 AM, Bert Gunter wrote: Homework? We don't do homework here. If not, search (e.g. via google -- R hierarchical Bayes -- or some such). -- Bert On Mon, Feb 25, 2013 at 1:39 AM, Ali A. Bromideh a.bromi...@ikco.com wrote: Dear Sir/Madam, I apologize for any cross-posting. I got a simple question, which I thought the R list may help me to find an answer. Suppose we have Y_1, Y_2, ., Y_n ~ Poisson (Lambda_i) and Lambda_i ~Gamma(alpha_i, beta_i). Empirical Bayes Estimator for hyper-parameters of the gamma distr, i.e. (alpha_t, beta_t) are needed. y=c(12,5,17,14) n=4 What about a Hierarchal B ayes estimators? Any relevant work and codes in R (or S+) is highly appreciated. Kind regards, Ali [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost atistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help searching a matrix for only certain records
Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kolmogorov-Smirnov: calculate p value given as input the test statistic
Dear all, I calculate the test statistic for the KS test outside R, and wish to use R only to calculate the corresponding p-value. Is there a way for doing this? (as far as I see, ks.test() requires raw data as input). Alternatively, is there a way to provide the ks.test() the two CDFs (two samples test) rather than the (x, y) data vectors? Thanks in advance, Rani [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
Try this: dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote: Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov: calculate p value given as input the test statistic
Hello, You can compute the p-value from the test statistic if you know the samples' sizes. R calls functions written in C for the several cases, for the two samples case, this is the code (edited) n.x - 100 # length of 1st sample n.y - 100 # length of 2nd sample STATISTIC - 1.23 PVAL - 1 - .C(psmirnov2x, p = as.double(STATISTIC), as.integer(n.x), as.integer(n.y))$p PVAL - min(1.0, max(0.0, PVAL)) For the other cases check the source, file stats/ks.test.R. As for the second question, I believe the answer is no, you must provide at least on sample and a CDF. Something like x - rnorm(100) f - ecdf(rnorm(100)) ks.test(x, f) Hope this helps, Rui Barradas Em 03-03-2013 09:58, Rani Elkon escreveu: Dear all, I calculate the test statistic for the KS test outside R, and wish to use R only to calculate the corresponding p-value. Is there a way for doing this? (as far as I see, ks.test() requires raw data as input). Alternatively, is there a way to provide the ks.test() the two CDFs (two samples test) rather than the (x, y) data vectors? Thanks in advance, Rani [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret pls model statistics
Thank you for your response Max. Is there some literature that you make that statement? I am confused as I have seen many publications that contain R^2 and Q^2 following PLSDA analysis. The analysis usually is to discriminate groups (ie. classification). Are these papers incorrect in using these statistics? Regards, Charles On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn mxk...@gmail.com wrote: Charles, You should not be treating the classes as numeric (is virginica really three times setosa?). Q^2 and/or R^2 are not appropriate for classification. Max On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr deter...@umn.eduwrote: I have discovered on of my errors. The timematrix was unnecessary and an unfortunate habit I brought from another package. The following provides the same R2 values as it should, however, I still don't know how to retrieve Q2 values. Any insight would again be appreciated: library(caret) library(pls) data(iris) #needed to convert to numeric in order to do regression #I don't fully understand this but if I left as a factor I would get an error following the summary function iris$Species=as.numeric(iris$Species) inTrain1=createDataPartition(y=iris$Species, p=.75, list=FALSE) training1=iris[inTrain1,] testing1=iris[-inTrain1,] ctrl1=trainControl(method=cv, number=10) plsFit2=train(Species~., data=training1, method=pls, trControl=ctrl1, metric=Rsquared, preProc=c(scale)) data(iris) training1=iris[inTrain1,] datvars=training1[,1:4] dat.sc=scale(datvars) pls.dat=plsr(as.numeric(training1$Species)~dat.sc, ncomp=3, method=oscorespls, data=training1) x=crossval(pls.dat, segments=10) summary(x) summary(plsFit2) Regards, Charles On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr deter...@umn.edu wrote: Greetings, I have been exploring the use of the caret package to conduct some plsda modeling. Previously, I have come across methods that result in a R2 and Q2 for the model. Using the 'iris' data set, I wanted to see if I could accomplish this with the caret package. I use the following code: library(caret) data(iris) #needed to convert to numeric in order to do regression #I don't fully understand this but if I left as a factor I would get an error following the summary function iris$Species=as.numeric(iris$Species) inTrain1=createDataPartition(y=iris$Species, p=.75, list=FALSE) training1=iris[inTrain1,] testing1=iris[-inTrain1,] ctrl1=trainControl(method=cv, number=10) plsFit2=train(Species~., data=training1, method=pls, trControl=ctrl1, metric=Rsquared, preProc=c(scale)) data(iris) training1=iris[inTrain1,] datvars=training1[,1:4] dat.sc=scale(datvars) n=nrow(dat.sc) dat.indices=seq(1,n) timematrix=with(training1, classvec2classmat(Species[dat.indices])) pls.dat=plsr(timematrix ~ dat.sc, ncomp=3, method=oscorespls, data=training1) x=crossval(pls.dat, segments=10) summary(x) summary(plsFit2) I see two different R2 values and I cannot figure out how to get the Q2 value. Any insight as to what my errors may be would be appreciated. Regards, -- Charles -- Charles Determan Integrated Biosciences PhD Student University of Minnesota [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Charles Determan Integrated Biosciences PhD Student University of Minnesota [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Errors-In-Variables in R
Dear Cedric, If I understand correctly what you want to do, and if you're willing to assume that the variables are normally distributed, then you should be able to use any of the latent-variable structural-equation-modeling packages in R, such as sem, OpenMX, or lavaan. Here's an artificial example using the sem package: snip -- set.seed(12345) zeta - rnorm(1000) y - 1 + 2*zeta + rnorm(1000, 0, 1) x - zeta + rnorm(1000) plot(x, y) Data - data.frame(x, y) summary(lm(y ~ x)) # biased Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -6.6339 -1.1406 0.0299 1.1573 6.5652 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.040070.05514 18.86 2e-16 *** x1.060890.04012 26.44 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.743 on 998 degrees of freedom Multiple R-squared: 0.4119,Adjusted R-squared: 0.4113 F-statistic: 699.1 on 1 and 998 DF, p-value: 2.2e-16 plot(x, y) # not shown library(sem) eqns - specifyEquations() 1: y = alpha*Intercept + beta*zeta 2: x = 1*zeta 3: V(y) = sigma 4: V(x) = 1 5: V(zeta) = phi 6: Read 5 items model - sem(eqns, data=Data, raw=TRUE, fixed.x=Intercept) summary(model) Model fit to raw moment matrix. Model Chisquare = 0.2264654 Df = 1 Pr(Chisq) = 0.6341572 AIC = 8.226465 BIC = -6.68129 Normalized Residuals Min. 1st Qu. MedianMean 3rd Qu.Max. 0. 0.1635 0.1711 0.2189 0.2564 0.4759 Parameter Estimates Estimate Std Error z value Pr(|z|) alpha 1.0400668 0.05507397 18.884905 1.518098e-79 y --- Intercept beta 2.2553406 0.14197058 15.885971 7.926103e-57 y --- zeta sigma 0.6404697 0.25612060 2.500657 1.239632e-02 y -- y phi 0.8881856 0.08444223 10.518263 7.117323e-26 zeta -- zeta Iterations = 15 library(car) linearHypothesis(model, c(alpha = 1, beta = 2, sigma = 1, phi = 1)) # true parameter values Linear hypothesis test Hypothesis: alpha = 1 beta = 2 sigma = 1 phi = 1 Model 1: restricted model Model 2: model Res.Df Df Chisq Pr(Chisq) 1 5 2 1 4 3.8285 0.4297 snip -- For other distributional assumptions, you'd have to write your own objective function but you may still be able to use sem or one of the other SEM packages. I hope this helps, John --- John Fox Senator McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Cedric Sodhi Sent: Saturday, March 02, 2013 4:56 PM To: Rui Barradas Cc: r-help@r-project.org Subject: Re: [R] Errors-In-Variables in R Perhaps it would have been clearer that this is no homework if I hadn't forgotten to say what [1] is. Sorry for that. [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225 (This is no homework but genuinely adresses the problem that R to my knowledge does not have models for error in variables) On Sat, Mar 02, 2013 at 09:34:21PM +, Rui Barradas wrote: There's a no homework policy in R-help. Rui Barradas Em 02-03-2013 18:28, Cedric Sodhi escreveu: In reference to [1], how would you solve the following regression problem: Given observations (X_i,Y_i) with known respective error distributions (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the parameters a and b which maximize the Likelihood of Y = a*X + b Taking the example further, how many of the very simplified assumptions from the above example can be lifted or eased and R still has a method for finding an errors-in-variables fit? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-Smirnov: calculate p value given as input the test statistic
On 03/03/2013 09:58, Rani Elkon wrote: Dear all, I calculate the test statistic for the KS test outside R, and wish to use R only to calculate the corresponding p-value. There is no public way to do this in R. But you can read the code of ks.test and see how it does it, and extract the code you need. Note that ks.test covers several cases and hence has several branches of code to compute p values. Also (and this is one good reason why there is no a public interface), the internal code differs by version of R (so another answer I have just seen is wrong for pre-3.0.0). Is there a way for doing this? (as far as I see, ks.test() requires raw data as input). Alternatively, is there a way to provide the ks.test() the two CDFs (two samples test) rather than the (x, y) data vectors? Yes, because if you have the CDF you can recover the sorted data vector which is all you need. Thanks in advance, Rani -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
Hi, Try this: set.seed(51) mat1- as.matrix(data.frame(REC.TYPE= sample(c(SAO,FAO,FL-1,FL-2,FL-15),20,replace=TRUE),Col2=rnorm(20),Col3=runif(20),stringsAsFactors=FALSE)) dat1- as.data.frame(mat1,stringsAsFactors=FALSE) dat1[grepl(SAO|FL-15,dat1$REC.TYPE),] # REC.TYPE Col2 Col3 #4 FL-15 -1.31594143 0.41193183 #6 FL-15 0.43419586 0.96004780 #9 FL-15 -0.90690732 0.84000657 #10 SAO 0.21363265 0.20155142 #13 SAO -0.55566727 0.71606558 #15 SAO -0.71533068 0.90851364 #17 SAO 1.58611036 0.97475674 #20 SAO -0.42904914 0.33710578 A.K. - Original Message - From: Matt Borkowski mathias1...@yahoo.com To: r-help@r-project.org Cc: Sent: Sunday, March 3, 2013 1:11 AM Subject: [R] Help searching a matrix for only certain records Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAO dataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Sample with constraints
Dear R friends, I'd like to generate random sample (variable size and range) without a specified distribution but with given mean and standard deviation. Could you help me? thanks in advance Angelo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Sample with constraints
Angelo Scozzarella Tiscali angeloscozzarella at tiscali.it writes: Dear R friends, I'd like to generate random sample (variable size and range) without a specified distribution but with given mean and standard deviation. Could you help me? The problem is underspecified, so no, we can't. Any random sample will by definition be a sample from _some_ distribution. If you give more context someone might able be to help you with a solution. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
there are way more efficient ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times. My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time. a more efficient way might be: testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ] you can always use 'system.time' to determine how long actions take. for multiple comparisons use %in% Sent from my iPad On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote: Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote: Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Sample with constraints
For example, I want to simulate different populations with same mean and standard deviation but different distribution. Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto: Dear R friends, I'd like to generate random sample (variable size and range) without a specified distribution but with given mean and standard deviation. Could you help me? thanks in advance Angelo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
If you are using matrices, then here is several ways of doing it for size 300,000. You can determine if the difference of 0.1 seconds is important in terms of the performance you are after. It is taking you more time to type in the statements than it is taking them to execute: n - 30 testdata - matrix( + sample(c(SAO , FL-15, Other), n, TRUE, prob = c(1,2,1000)) + , nrow = n + , dimnames = list(NULL, REC.TYPE) + ) table(testdata[, REC.TYPE]) FL-15 Other SAO 562 299151287 system.time(x1 - subset(testdata, grepl((SAO |FL-15), testdata[, REC.TYPE]))) user system elapsed 0.170.000.17 system.time(x2 - subset(testdata, testdata[, REC.TYPE] %in% c(SAO , FL-15))) user system elapsed 0.050.000.05 system.time(x3 - testdata[match(testdata[, REC.TYPE] + , c(SAO , FL-15) + , nomatch = 0) != 0 + ,, drop = FALSE] + ) user system elapsed 0.030.000.03 identical(x1, x2) [1] TRUE identical(x2, x3) [1] TRUE On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com wrote: there are way more efficient ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times. My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time. a more efficient way might be: testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ] you can always use 'system.time' to determine how long actions take. for multiple comparisons use %in% Sent from my iPad On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote: Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote: Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Sample with constraints
On 03-Mar-2013 16:29:05 Angelo Scozzarella Tiscali wrote: For example, I want to simulate different populations with same mean and standard deviation but different distribution. Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto: Dear R friends, I'd like to generate random sample (variable size and range) without a specified distribution but with given mean and standard deviation. Could you help me? thanks in advance Angelo As Ben Bolker said, any random sample must come from some distribution, so you cannot generate one without specifying some distribution. Insofar as your question can be interpreted, it will be satisfied if, given the desired mean, M, and SD, S, you take any two available distributions with, respectively, known means M1 and M2 and known SDs S1 and S2. Let X1 denote a sample from t5he first, and X2 a sample from the second. Then (X1 - M1)/(S1/S) is a sample from the first distribution re-scaled to have mean M and SD S, as required. Similarly, (X2 - M2)/(S2/S) is a sample from the second distribution re-scaled to have mean M and SD S, as required. As for what the first distribution that you sample from, and the second, that can be at your own choice -- for eample, the first could be the Standard Normal (M1 = 0, S1 = 1); use rnomr(). The second could be the uniform on (0,1) (M2 = 0.5, S2 = 1/sqrt(12)); use runif(). Similar for other arbitrary choices of first and second distribution (so long as each has at least a second moment, hence excluding, for example, the Cauchy distribution). That's about as far as one can go with your question! Hoping it helps, howevr. Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 03-Mar-2013 Time: 17:12:50 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
HI, You could also use ?data.table() n- 30 set.seed(51) mat1- as.matrix(data.frame(REC.TYPE= sample(c(SAO,FAO,FL-1,FL-2,FL-15),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE)) dat1- as.data.frame(mat1,stringsAsFactors=FALSE) table(mat1[,1]) # # FAO FL-1 FL-15 FL-2 SAO #60046 60272 59669 59878 60135 system.time(x1 - subset(mat1, grepl((SAO|FL-15), mat1[, REC.TYPE]))) #user system elapsed # 0.076 0.004 0.082 system.time(x2 - subset(mat1, mat1[, REC.TYPE] %in% c(SAO, FL-15))) # user system elapsed # 0.028 0.000 0.030 system.time(x3 - mat1[match(mat1[, REC.TYPE] , c(SAO, FL-15) , nomatch = 0) != 0 ,, drop = FALSE] ) #user system elapsed # 0.028 0.000 0.028 table(x3[,1]) # #FL-15 SAO #59669 60135 library(data.table) dat2- data.table(dat1) system.time(x4- dat2[match(REC.TYPE,c(SAO, FL-15),nomatch=0)!=0,,drop=FALSE]) # user system elapsed #0.024 0.000 0.025 table(x4$REC.TYPE) #FL-15 SAO #59669 60135 A.K. - Original Message - From: jim holtman jholt...@gmail.com To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Sent: Sunday, March 3, 2013 11:52 AM Subject: Re: [R] Help searching a matrix for only certain records If you are using matrices, then here is several ways of doing it for size 300,000. You can determine if the difference of 0.1 seconds is important in terms of the performance you are after. It is taking you more time to type in the statements than it is taking them to execute: n - 30 testdata - matrix( + sample(c(SAO , FL-15, Other), n, TRUE, prob = c(1,2,1000)) + , nrow = n + , dimnames = list(NULL, REC.TYPE) + ) table(testdata[, REC.TYPE]) FL-15 Other SAO 562 299151 287 system.time(x1 - subset(testdata, grepl((SAO |FL-15), testdata[, REC.TYPE]))) user system elapsed 0.17 0.00 0.17 system.time(x2 - subset(testdata, testdata[, REC.TYPE] %in% c(SAO , FL-15))) user system elapsed 0.05 0.00 0.05 system.time(x3 - testdata[match(testdata[, REC.TYPE] + , c(SAO , FL-15) + , nomatch = 0) != 0 + ,, drop = FALSE] + ) user system elapsed 0.03 0.00 0.03 identical(x1, x2) [1] TRUE identical(x2, x3) [1] TRUE On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com wrote: there are way more efficient ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times. My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time. a more efficient way might be: testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ] you can always use 'system.time' to determine how long actions take. for multiple comparisons use %in% Sent from my iPad On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote: Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote: Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAO dataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I
Re: [R] Help searching a matrix for only certain records
Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote: Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAO dataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] distribution functions and lists
Hello everyone, I have a quick question but I am stuck with it and I do not know how to solve it. Imagine I need the distribution function of a Weibull(1,1) at t=3, then I will write pweibull(3,1,1). I want to keep the shape and scale parameters in a list (or a vector or whatever). Then I have parameters-list(shape=1,scale=1) but when I write pweibull(3,parameters) I get the following error: Error in pweibull(q, shape, scale, lower.tail, log.p) : Non-numeric argument to mathematical function I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am very interested in being able to write pweibull(3,parameters). Does anyone know how to solve it? Thank you very much, Oleguer Plana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] distribution functions and lists
Le dimanche 03 mars 2013 à 19:49 +0100, Oleguer Plana Ripoll a écrit : Hello everyone, I have a quick question but I am stuck with it and I do not know how to solve it. Imagine I need the distribution function of a Weibull(1,1) at t=3, then I will write pweibull(3,1,1). I want to keep the shape and scale parameters in a list (or a vector or whatever). Then I have parameters-list(shape=1,scale=1) but when I write pweibull(3,parameters) I get the following error: Error in pweibull(q, shape, scale, lower.tail, log.p) : Non-numeric argument to mathematical function I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am very interested in being able to write pweibull(3,parameters). Does anyone know how to solve it? What you are looking for is do.call(): parameters - list(q=3, shape=1, scale=1) do.call(pweibull, parameters) or parameters - list(shape=1, scale=1) do.call(pweibull, c(q=3, parameters)) Regards Thank you very much, Oleguer Plana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret pls model statistics
I was under the impression that in PLS analysis, R2 was calculated by 1- (Residual sum of squares) / (Sum of squares). Is this still what you are referring to? I am aware of the linear R2 which is how well two variables are correlated but the prior equation seems different to me. Could you explain if this is the same concept? Charles On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn mxk...@gmail.com wrote: Is there some literature that you make that statement? No, but there isn't literature on changing a lightbulb with a duck either. Are these papers incorrect in using these statistics? Definitely, if they convert 3+ categories to integers (but there are specialized R^2 metrics for binary classification models). Otherwise, they are just using an ill-suited score. How would you explain such an R^2 value to someone? R^2 is a function of correlation between the two random variables. For two classes, one of them is binary. What does it mean? Historically, models rooted in computer science (eg neural networks) used RMSE or SSE to fit models with binary outcomes and that *can* work work well. However, I don't think that communicating R^2 is effective. Other metrics (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to measure the ability of a model to classify and work well. With 3+ categories, I tend to use Kappa. Max On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr deter...@umn.eduwrote: Thank you for your response Max. Is there some literature that you make that statement? I am confused as I have seen many publications that contain R^2 and Q^2 following PLSDA analysis. The analysis usually is to discriminate groups (ie. classification). Are these papers incorrect in using these statistics? Regards, Charles On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn mxk...@gmail.com wrote: Charles, You should not be treating the classes as numeric (is virginica really three times setosa?). Q^2 and/or R^2 are not appropriate for classification. Max On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr deter...@umn.eduwrote: I have discovered on of my errors. The timematrix was unnecessary and an unfortunate habit I brought from another package. The following provides the same R2 values as it should, however, I still don't know how to retrieve Q2 values. Any insight would again be appreciated: library(caret) library(pls) data(iris) #needed to convert to numeric in order to do regression #I don't fully understand this but if I left as a factor I would get an error following the summary function iris$Species=as.numeric(iris$Species) inTrain1=createDataPartition(y=iris$Species, p=.75, list=FALSE) training1=iris[inTrain1,] testing1=iris[-inTrain1,] ctrl1=trainControl(method=cv, number=10) plsFit2=train(Species~., data=training1, method=pls, trControl=ctrl1, metric=Rsquared, preProc=c(scale)) data(iris) training1=iris[inTrain1,] datvars=training1[,1:4] dat.sc=scale(datvars) pls.dat=plsr(as.numeric(training1$Species)~dat.sc, ncomp=3, method=oscorespls, data=training1) x=crossval(pls.dat, segments=10) summary(x) summary(plsFit2) Regards, Charles On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr deter...@umn.edu wrote: Greetings, I have been exploring the use of the caret package to conduct some plsda modeling. Previously, I have come across methods that result in a R2 and Q2 for the model. Using the 'iris' data set, I wanted to see if I could accomplish this with the caret package. I use the following code: library(caret) data(iris) #needed to convert to numeric in order to do regression #I don't fully understand this but if I left as a factor I would get an error following the summary function iris$Species=as.numeric(iris$Species) inTrain1=createDataPartition(y=iris$Species, p=.75, list=FALSE) training1=iris[inTrain1,] testing1=iris[-inTrain1,] ctrl1=trainControl(method=cv, number=10) plsFit2=train(Species~., data=training1, method=pls, trControl=ctrl1, metric=Rsquared, preProc=c(scale)) data(iris) training1=iris[inTrain1,] datvars=training1[,1:4] dat.sc=scale(datvars) n=nrow(dat.sc) dat.indices=seq(1,n) timematrix=with(training1, classvec2classmat(Species[dat.indices])) pls.dat=plsr(timematrix ~ dat.sc, ncomp=3, method=oscorespls, data=training1) x=crossval(pls.dat, segments=10) summary(x) summary(plsFit2) I see two different R2 values and I cannot figure out how to get the Q2 value. Any insight as to what my errors may be would be appreciated. Regards, -- Charles -- Charles Determan Integrated Biosciences PhD Student University of Minnesota [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] distribution functions and lists
Dear Milan and other users, Thank you for your help, it worked. The problem is that the function do.call is not ready for vectors and I need it in order to integrate it afterwards. With the pweibull, I can write: pweibull(1,shape=1) pweibull(2,shape=1) pweibull(1:2,shape=1) When I do the same with the do.call, I obtain an error: do.call(pweibull,c(q=1,list(shape=1,scale=1))) do.call(pweibull,c(q=2,list(shape=1,scale=1))) do.call(pweibull,c(q=1:2,list(shape=1,scale=1))) Error in pweibull(q1 = 1L, q2 = 2L, shape = 1, scale = 1) : unused argument(s) (q1 = 1, q2 = 2) Do you know how can I solve it? Thank you, Oleguer On 03/03/2013, at 20:32, Milan Bouchet-Valat nalimi...@club.fr wrote: Le dimanche 03 mars 2013 à 19:49 +0100, Oleguer Plana Ripoll a écrit : Hello everyone, I have a quick question but I am stuck with it and I do not know how to solve it. Imagine I need the distribution function of a Weibull(1,1) at t=3, then I will write pweibull(3,1,1). I want to keep the shape and scale parameters in a list (or a vector or whatever). Then I have parameters-list(shape=1,scale=1) but when I write pweibull(3,parameters) I get the following error: Error in pweibull(q, shape, scale, lower.tail, log.p) : Non-numeric argument to mathematical function I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am very interested in being able to write pweibull(3,parameters). Does anyone know how to solve it? What you are looking for is do.call(): parameters - list(q=3, shape=1, scale=1) do.call(pweibull, parameters) or parameters - list(shape=1, scale=1) do.call(pweibull, c(q=3, parameters)) Regards Thank you very much, Oleguer Plana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] distribution functions and lists
On 13-03-03 3:39 PM, Oleguer Plana Ripoll wrote: Dear Milan and other users, Thank you for your help, it worked. The problem is that the function do.call is not ready for vectors and I need it in order to integrate it afterwards. do.call() is fine, it's the argument list that needs fixing. Construct a list containing elements q, shape, and scale. With the pweibull, I can write: pweibull(1,shape=1) pweibull(2,shape=1) pweibull(1:2,shape=1) When I do the same with the do.call, I obtain an error: do.call(pweibull,c(q=1,list(shape=1,scale=1))) do.call(pweibull,c(q=2,list(shape=1,scale=1))) do.call(pweibull,c(q=1:2,list(shape=1,scale=1))) Error in pweibull(q1 = 1L, q2 = 2L, shape = 1, scale = 1) : unused argument(s) (q1 = 1, q2 = 2) Do you know how can I solve it? parameters - list(shape=1, scale=1) do.call(pweibull, c(list(q=1:2), parameters)) Duncan Murdoch Thank you, Oleguer On 03/03/2013, at 20:32, Milan Bouchet-Valat nalimi...@club.fr wrote: Le dimanche 03 mars 2013 à 19:49 +0100, Oleguer Plana Ripoll a écrit : Hello everyone, I have a quick question but I am stuck with it and I do not know how to solve it. Imagine I need the distribution function of a Weibull(1,1) at t=3, then I will write pweibull(3,1,1). I want to keep the shape and scale parameters in a list (or a vector or whatever). Then I have parameters-list(shape=1,scale=1) but when I write pweibull(3,parameters) I get the following error: Error in pweibull(q, shape, scale, lower.tail, log.p) : Non-numeric argument to mathematical function I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am very interested in being able to write pweibull(3,parameters). Does anyone know how to solve it? What you are looking for is do.call(): parameters - list(q=3, shape=1, scale=1) do.call(pweibull, parameters) or parameters - list(shape=1, scale=1) do.call(pweibull, c(q=3, parameters)) Regards Thank you very much, Oleguer Plana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caret pls model statistics
That the most common formula, but not the only one. See Kvålseth, T. (1985). Cautionary note about $R^2$. *American Statistician*, *39*(4), 279285. Traditionally, the symbol 'R' is used for the Pearson correlation coefficient and one way to calculate R^2 is... R^2. Max On Sun, Mar 3, 2013 at 3:16 PM, Charles Determan Jr deter...@umn.eduwrote: I was under the impression that in PLS analysis, R2 was calculated by 1- (Residual sum of squares) / (Sum of squares). Is this still what you are referring to? I am aware of the linear R2 which is how well two variables are correlated but the prior equation seems different to me. Could you explain if this is the same concept? Charles On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn mxk...@gmail.com wrote: Is there some literature that you make that statement? No, but there isn't literature on changing a lightbulb with a duck either. Are these papers incorrect in using these statistics? Definitely, if they convert 3+ categories to integers (but there are specialized R^2 metrics for binary classification models). Otherwise, they are just using an ill-suited score. How would you explain such an R^2 value to someone? R^2 is a function of correlation between the two random variables. For two classes, one of them is binary. What does it mean? Historically, models rooted in computer science (eg neural networks) used RMSE or SSE to fit models with binary outcomes and that *can* work work well. However, I don't think that communicating R^2 is effective. Other metrics (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to measure the ability of a model to classify and work well. With 3+ categories, I tend to use Kappa. Max On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr deter...@umn.eduwrote: Thank you for your response Max. Is there some literature that you make that statement? I am confused as I have seen many publications that contain R^2 and Q^2 following PLSDA analysis. The analysis usually is to discriminate groups (ie. classification). Are these papers incorrect in using these statistics? Regards, Charles On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn mxk...@gmail.com wrote: Charles, You should not be treating the classes as numeric (is virginica really three times setosa?). Q^2 and/or R^2 are not appropriate for classification. Max On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr deter...@umn.eduwrote: I have discovered on of my errors. The timematrix was unnecessary and an unfortunate habit I brought from another package. The following provides the same R2 values as it should, however, I still don't know how to retrieve Q2 values. Any insight would again be appreciated: library(caret) library(pls) data(iris) #needed to convert to numeric in order to do regression #I don't fully understand this but if I left as a factor I would get an error following the summary function iris$Species=as.numeric(iris$Species) inTrain1=createDataPartition(y=iris$Species, p=.75, list=FALSE) training1=iris[inTrain1,] testing1=iris[-inTrain1,] ctrl1=trainControl(method=cv, number=10) plsFit2=train(Species~., data=training1, method=pls, trControl=ctrl1, metric=Rsquared, preProc=c(scale)) data(iris) training1=iris[inTrain1,] datvars=training1[,1:4] dat.sc=scale(datvars) pls.dat=plsr(as.numeric(training1$Species)~dat.sc, ncomp=3, method=oscorespls, data=training1) x=crossval(pls.dat, segments=10) summary(x) summary(plsFit2) Regards, Charles On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr deter...@umn.edu wrote: Greetings, I have been exploring the use of the caret package to conduct some plsda modeling. Previously, I have come across methods that result in a R2 and Q2 for the model. Using the 'iris' data set, I wanted to see if I could accomplish this with the caret package. I use the following code: library(caret) data(iris) #needed to convert to numeric in order to do regression #I don't fully understand this but if I left as a factor I would get an error following the summary function iris$Species=as.numeric(iris$Species) inTrain1=createDataPartition(y=iris$Species, p=.75, list=FALSE) training1=iris[inTrain1,] testing1=iris[-inTrain1,] ctrl1=trainControl(method=cv, number=10) plsFit2=train(Species~., data=training1, method=pls, trControl=ctrl1, metric=Rsquared, preProc=c(scale)) data(iris) training1=iris[inTrain1,] datvars=training1[,1:4] dat.sc=scale(datvars) n=nrow(dat.sc) dat.indices=seq(1,n) timematrix=with(training1, classvec2classmat(Species[dat.indices])) pls.dat=plsr(timematrix ~ dat.sc, ncomp=3, method=oscorespls, data=training1) x=crossval(pls.dat, segments=10) summary(x) summary(plsFit2) I see two different R2 values and I cannot figure out
[R] Creating 3d partial dependence plots
Help, I've been having a difficult time trying to create 3d partial dependence plots using rgl. It looks like this question has been asked a couple times, but I'm unable to find a clear answer googling. I've tried creating x, y, and z variables by extracting them from the partialPlot output to no avail. I've seen these plots used several times in articles, and I think they would help me a great deal looking at interactions. Could someone provide a coding example using randomForest and rgl? It would be greatly appreciated. Thank you, Jerrod Parker [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating 3d partial dependence plots
On 13-03-03 7:08 PM, Jerrod Parker wrote: Help, I've been having a difficult time trying to create 3d partial dependence plots using rgl. It looks like this question has been asked a couple times, but I'm unable to find a clear answer googling. I've tried creating x, y, and z variables by extracting them from the partialPlot output to no avail. I've seen these plots used several times in articles, and I think they would help me a great deal looking at interactions. Could someone provide a coding example using randomForest and rgl? It would be greatly appreciated. I think you are making your question too hard to answer. Show us an example of what you tried (a self-contained, minimal example, of course) and we'll suggest ways to fix it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters
Did you try using MLE to approximate the marginal? On Mar 3, 2013, at 5:26 AM, Ali A. Bromideh wrote: Dear Nicole, First of all, my sincere gratitude goes to your kind reply. As I told to Mr. Gunter, this is a part of my research and differs from homework. However, I am going to clarify the problem. Suppose we have received an observation from a Poisson distr. i.e. Y_1~Pois(Lam_1), where Lam_1~Gamma(alpha_1, beta_1). Now, what's the empirical Bayes (EB) estimation for alpha_1 and beta_1? Let Y_2~Pois(Lam_2) and Lam_2~Gamma(alpha_2, beta_2). Again how can we calculate EB for alpha_2 and beta_2? In fact, I read the relevant paper by Robbins at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC350425/ but it gave 0 for Y_1. And for the Var(Y) E(Y), it generates negative value for positive value of alpha/beta!! Any idea? Kind regards, -Original Message- From: Nicole Ford [mailto:nicole.f...@me.com] Sent: Sunday, March 03, 2013 4:09 AM To: Bert Gunter Cc: Boroumideh-Ali Akbar; r-help@r-project.org Subject: Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters also, kruschke at indiana has some info on this, both online and youtube. (if homework.) if not, more infor will be helpful. ~n On Feb 25, 2013, at 9:41 AM, Bert Gunter wrote: Homework? We don't do homework here. If not, search (e.g. via google -- R hierarchical Bayes -- or some such). -- Bert On Mon, Feb 25, 2013 at 1:39 AM, Ali A. Bromideh a.bromi...@ikco.com wrote: Dear Sir/Madam, I apologize for any cross-posting. I got a simple question, which I thought the R list may help me to find an answer. Suppose we have Y_1, Y_2, ., Y_n ~ Poisson (Lambda_i) and Lambda_i ~Gamma(alpha_i, beta_i). Empirical Bayes Estimator for hyper-parameters of the gamma distr, i.e. (alpha_t, beta_t) are needed. y=c(12,5,17,14) n=4 What about a Hierarchal B ayes estimators? Any relevant work and codes in R (or S+) is highly appreciated. Kind regards, Ali [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost atistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
I appreciate all the feedback on this. I ended up using this line to solve my problem, just because I stumbled upon it first... alldata - alldata[alldata$REC.TYPE == SAO | alldata$REC.TYPE == FM-15,,drop=FALSE] But I think Jim's solution would work equally as well. I was a bit confused by the relative complexity of the data frames solution, as it seems like more steps than necessary. Thanks again for the input! -Matt Again, thanks for the feedback! --- On Sun, 3/3/13, arun smartpink...@yahoo.com wrote: From: arun smartpink...@yahoo.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: R help r-help@r-project.org, jim holtman jholt...@gmail.com Date: Sunday, March 3, 2013, 1:29 PM HI, You could also use ?data.table() n- 30 set.seed(51) mat1- as.matrix(data.frame(REC.TYPE= sample(c(SAO,FAO,FL-1,FL-2,FL-15),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE)) dat1- as.data.frame(mat1,stringsAsFactors=FALSE) table(mat1[,1]) # # FAO FL-1 FL-15 FL-2 SAO #60046 60272 59669 59878 60135 system.time(x1 - subset(mat1, grepl((SAO|FL-15), mat1[, REC.TYPE]))) #user system elapsed # 0.076 0.004 0.082 system.time(x2 - subset(mat1, mat1[, REC.TYPE] %in% c(SAO, FL-15))) # user system elapsed # 0.028 0.000 0.030 system.time(x3 - mat1[match(mat1[, REC.TYPE] , c(SAO, FL-15) , nomatch = 0) != 0 ,, drop = FALSE] ) #user system elapsed # 0.028 0.000 0.028 table(x3[,1]) # #FL-15 SAO #59669 60135 library(data.table) dat2- data.table(dat1) system.time(x4- dat2[match(REC.TYPE,c(SAO, FL-15),nomatch=0)!=0,,drop=FALSE]) # user system elapsed #0.024 0.000 0.025 table(x4$REC.TYPE) #FL-15 SAO #59669 60135 A.K. - Original Message - From: jim holtman jholt...@gmail.com To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Sent: Sunday, March 3, 2013 11:52 AM Subject: Re: [R] Help searching a matrix for only certain records If you are using matrices, then here is several ways of doing it for size 300,000. You can determine if the difference of 0.1 seconds is important in terms of the performance you are after. It is taking you more time to type in the statements than it is taking them to execute: n - 30 testdata - matrix( + sample(c(SAO , FL-15, Other), n, TRUE, prob = c(1,2,1000)) + , nrow = n + , dimnames = list(NULL, REC.TYPE) + ) table(testdata[, REC.TYPE]) FL-15 Other SAO 562 299151 287 system.time(x1 - subset(testdata, grepl((SAO |FL-15), testdata[, REC.TYPE]))) user system elapsed 0.17 0.00 0.17 system.time(x2 - subset(testdata, testdata[, REC.TYPE] %in% c(SAO , FL-15))) user system elapsed 0.05 0.00 0.05 system.time(x3 - testdata[match(testdata[, REC.TYPE] + , c(SAO , FL-15) + , nomatch = 0) != 0 + ,, drop = FALSE] + ) user system elapsed 0.03 0.00 0.03 identical(x1, x2) [1] TRUE identical(x2, x3) [1] TRUE On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com wrote: there are way more efficient ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times. My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time. a more efficient way might be: testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ] you can always use 'system.time' to determine how long actions take. for multiple comparisons use %in% Sent from my iPad On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote: Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO
Re: [R] SAS and R complement each other
I'm not sure why you posted the original note. I quit using SAS in 1991 and haven't needed it yet. Frank RogerJDeAngelis wrote Sorry about the double post. But I keep getting 'post' rejections, so I resubmitted about an hour later. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/SAS-and-R-complement-each-other-tp4660157p4660190.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.