Re: [R] Number of Cores limited to two in CRAN
Dear Henrik, Thank you! This is quite helpful, especially your longer blog post. Best regards, Ravi From: Henrik Bengtsson Sent: Sunday, July 2, 2023 4:33 AM To: Ravi Varadhan Cc: R-Help Subject: Re: [R] Number of Cores limited to two in CRAN External Email - Use Caution Short answer: You don't want to override that limit in your R package. Don't do it. Long answer: You'll find the reason for this in the 'CRAN Repository Policy' (https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcran.r-project.org%2Fweb%2Fpackages%2Fpolicies.html&data=05%7C01%7Cravi.varadhan%40jhu.edu%7Cbf56c90173504853907408db7ad70f08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638238836311718317%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KDB6%2BqMgn4KrKtVEgWezfRY9isT926wLwZGg5yuDKI4%3D&reserved=0<https://cran.r-project.org/web/packages/policies.html>). Specifically, the following passage: "Checking the package should take as little CPU time as possible, as the CRAN check farm is a very limited resource and there are thousands of packages. Long-running tests and vignette code can be made optional for checking, but do ensure that the checks that are left do exercise all the features of the package. **If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously.** Examples should run for no more than a few seconds each: they are intended to exemplify to the would-be user how to use the functions in the package." Basically, you can use two cores to demonstrate or validate (e.g. in package tests) that your code *can* run in parallel, but you must not use more than that to demonstrate that your code can "run super fast". Even-longer answer: See my blog post 'Please Avoid detectCores() in your R Packages' (https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jottr.org%2F2022%2F12%2F05%2Favoid-detectcores%2F&data=05%7C01%7Cravi.varadhan%40jhu.edu%7Cbf56c90173504853907408db7ad70f08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638238836311718317%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xPkA6JCccHZcST76lWa6xFYWWKBOZYQCOezF5t4YVPM%3D&reserved=0<https://www.jottr.org/2022/12/05/avoid-detectcores/>) from 2022-12-05 for even more reasons. /Henrik On Sun, Jul 2, 2023 at 9:55 AM Ravi Varadhan via R-help wrote: > > This is the specific error messsage from R CMD check --as-cran > > > Error in .check_ncores(length(names)) : 16 simultaneous processes spawned > Calls: prepost -> makeCluster -> makePSOCKcluster -> .check_ncores > Execution halted > > > Thanks, > Ravi > > > From: Ravi Varadhan > Sent: Saturday, July 1, 2023 1:15 PM > To: R-Help > Subject: Number of Cores limited to two in CRAN > > Hi, > I am developing a package where I would like to utilize multiple cores for > parallel computing. However, I get an error message when I run R CMD check > --as-cran. > > I read that CRAN limits the number of cores to 2. Is this correct? Is there > any way to overcome this limitation? > > Thank you, > Ravi > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Cravi.varadhan%40jhu.edu%7Cbf56c90173504853907408db7ad70f08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638238836311873942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KTNwXurNvbEhviVf15RF0la%2FvUVGBW60yuyVVy9Umxc%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Cravi.varadhan%40jhu.edu%7Cbf56c90173504853907408db7ad70f08%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638238836311873942%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HvGb20UiUey9O3plynLiRmHQtqpjfrndk2iPfvleT3Q%3D&reserved=0<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Number of Cores limited to two in CRAN
This is the specific error messsage from R CMD check --as-cran Error in .check_ncores(length(names)) : 16 simultaneous processes spawned Calls: prepost -> makeCluster -> makePSOCKcluster -> .check_ncores Execution halted Thanks, Ravi ____ From: Ravi Varadhan Sent: Saturday, July 1, 2023 1:15 PM To: R-Help Subject: Number of Cores limited to two in CRAN Hi, I am developing a package where I would like to utilize multiple cores for parallel computing. However, I get an error message when I run R CMD check --as-cran. I read that CRAN limits the number of cores to 2. Is this correct? Is there any way to overcome this limitation? Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Number of Cores limited to two in CRAN
Hi, I am developing a package where I would like to utilize multiple cores for parallel computing. However, I get an error message when I run R CMD check --as-cran. I read that CRAN limits the number of cores to 2. Is this correct? Is there any way to overcome this limitation? Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use ifelse without invoking warnings
Thank you to Bert, Sarah, and John. I did consider suppressing warnings, but I felt that there must be a more principled approach. While John's solution is what I would prefer, I cannot help but wonder why `ifelse' was not constructed to avoid this behavior. Thanks & Best regards, Ravi From: John Fox Sent: Thursday, October 7, 2021 2:00 PM To: Ravi Varadhan Cc: R-Help Subject: Re: [R] How to use ifelse without invoking warnings External Email - Use Caution Dear Ravi, It's already been suggested that you could disable warnings, but that's risky in case there's a warning that you didn't anticipate. Here's a different approach: > kk <- k[k >= -1 & k <= n] > ans <- numeric(length(k)) > ans[k > n] <- 1 > ans[k >= -1 & k <= n] <- pbeta(p, kk + 1, n - kk, lower.tail=FALSE) > ans [1] 0.0 0.006821826 0.254991551 1.0 BTW, I don't think that you mentioned that p = 0.3, but that seems apparent from the output you showed. I hope this helps, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160038474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=Q33yXm36BwEVKUWO72CWFpSUx7gcEEXhM3qFi7n78ZM%3D&reserved=0 On 2021-10-07 12:29 p.m., Ravi Varadhan via R-help wrote: > Hi, > I would like to execute the following vectorized calculation: > >ans <- ifelse (k >= -1 & k <= n, pbeta(p, k+1, n-k, lower.tail = FALSE), > ifelse (k < -1, 0, 1) ) > > For example: > > >> k <- c(-1.2,-0.5, 1.5, 10.4) >> n <- 10 >> ans <- ifelse (k >= -1 & k <= n, pbeta(p,k+1,n-k,lower.tail=FALSE), ifelse >> (k < -1, 0, 1) ) > Warning message: > In pbeta(p, k + 1, n - k, lower.tail = FALSE) : NaNs produced >> print(ans) > [1] 0.0 0.006821826 0.254991551 1.0 > > The answer is correct. However, I would like to eliminate the annoying > warnings. Is there a better way to do this? > > Thank you, > Ravi > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160048428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=FXX%2B4zNT0JHBnDFO5dXBDQ484oQF1EK5%2Fa0dG9P%2F4k4%3D&reserved=0 > PLEASE do read the posting guide > https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160048428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ss2ohzJIY6qj0eAexk4yVzTzbjXxK5VZNors0GpsbA0%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use ifelse without invoking warnings
Hi, I would like to execute the following vectorized calculation: ans <- ifelse (k >= -1 & k <= n, pbeta(p, k+1, n-k, lower.tail = FALSE), ifelse (k < -1, 0, 1) ) For example: > k <- c(-1.2,-0.5, 1.5, 10.4) > n <- 10 > ans <- ifelse (k >= -1 & k <= n, pbeta(p,k+1,n-k,lower.tail=FALSE), ifelse (k > < -1, 0, 1) ) Warning message: In pbeta(p, k + 1, n - k, lower.tail = FALSE) : NaNs produced > print(ans) [1] 0.0 0.006821826 0.254991551 1.0 The answer is correct. However, I would like to eliminate the annoying warnings. Is there a better way to do this? Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A Question about MLE in R
I agree with John that SANN should be removed from optim. More importantly, the default choice of optimizer in optim should be changed from "Nelder-Mead" to "BFGS." Nelder-Mead is a bad choice for the most commonly encountered optimization problems in statistics. I really do not see a good reason for not changing the default in optim. Best regards, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast matrix multiplication
Does Microsoft open R come with pre-compiled BLAS that is optimized for matrix computations? Thanks, Ravi -Original Message- From: Ista Zahn Sent: Monday, August 13, 2018 3:18 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Fast matrix multiplication On Mon, Aug 13, 2018 at 2:41 PM Ravi Varadhan wrote: > > Hi Ista, > Thank you for the response. I use Windows. Is there a pre-compiled version > of openBLAS for windows that would make it easy for me to use it? Not sure. If you want an easy way I would use MRO. More info at https://mran.microsoft.com/rro#intelmkl1 --Ista > Thanks, > Ravi > > -Original Message- > From: Ista Zahn > Sent: Friday, August 10, 2018 12:20 PM > To: Ravi Varadhan > Cc: r-help@r-project.org > Subject: Re: [R] Fast matrix multiplication > > > Hi Ravi, > > You can achieve substantial speed up by using a faster BLAS (e.g., OpenBLAS > or MKL), especially on systems with multiple CPUs. On my (6 year old, but 8 > core) system your example takes 3.9 seconds with using the reference BLAS and > only 0.9 seconds using OpenBLAS. > > Best, > Ista > On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan wrote: > > > > Hi, > > > > I would like to compute: A %*% B %*% t(A) > > > > > > > > A is a mxn matrix and B is an nxn symmetric, positive-definite matrix, > > where m is large relative to n (e.g., m=50,000 and n=100). > > > > > > > > Here is a sample code. > > > > > > > > M <- 1 > > > > N <- 100 > > > > A <- matrix(rnorm(M*N), M, N) > > > > B <- crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric > > positive-definite matrix > > > > > > > > # method 1 > > > > system.time(D <- A %*% B %*% t(A)) > > > > > > > > # I can obtain speedup by using a Cholesky decomposition of B > > > > # method 2 > > > > system.time({ > > > > C <- t(chol(B)) > > > > E <- tcrossprod(A%*%C) > > > > }) > > > > > > > > all.equal(D, E) > > > > > > > > I am wondering how to obtain more substantial speedup. Any suggestions > > would be greatly appreciated. > > > > > > > > Thanks, > > > > Ravi > > > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast matrix multiplication
Hi Ista, Thank you for the response. I use Windows. Is there a pre-compiled version of openBLAS for windows that would make it easy for me to use it? Thanks, Ravi -Original Message- From: Ista Zahn Sent: Friday, August 10, 2018 12:20 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Fast matrix multiplication Hi Ravi, You can achieve substantial speed up by using a faster BLAS (e.g., OpenBLAS or MKL), especially on systems with multiple CPUs. On my (6 year old, but 8 core) system your example takes 3.9 seconds with using the reference BLAS and only 0.9 seconds using OpenBLAS. Best, Ista On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan wrote: > > Hi, > > I would like to compute: A %*% B %*% t(A) > > > > A is a mxn matrix and B is an nxn symmetric, positive-definite matrix, where > m is large relative to n (e.g., m=50,000 and n=100). > > > > Here is a sample code. > > > > M <- 1 > > N <- 100 > > A <- matrix(rnorm(M*N), M, N) > > B <- crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric > positive-definite matrix > > > > # method 1 > > system.time(D <- A %*% B %*% t(A)) > > > > # I can obtain speedup by using a Cholesky decomposition of B > > # method 2 > > system.time({ > > C <- t(chol(B)) > > E <- tcrossprod(A%*%C) > > }) > > > > all.equal(D, E) > > > > I am wondering how to obtain more substantial speedup. Any suggestions would > be greatly appreciated. > > > > Thanks, > > Ravi > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fast matrix multiplication
Hi, I would like to compute: A %*% B %*% t(A) A is a mxn matrix and B is an nxn symmetric, positive-definite matrix, where m is large relative to n (e.g., m=50,000 and n=100). Here is a sample code. M <- 1 N <- 100 A <- matrix(rnorm(M*N), M, N) B <- crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric positive-definite matrix # method 1 system.time(D <- A %*% B %*% t(A)) # I can obtain speedup by using a Cholesky decomposition of B # method 2 system.time({ C <- t(chol(B)) E <- tcrossprod(A%*%C) }) all.equal(D, E) I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated. Thanks, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding overall constraint in optim()
Here is what you do for your problem: require(BB) Mo.vect <- as.vector(tail(head(mo,i),1)) wgt.vect <- as.vector(tail(head(moWeightsMax,i),1)) cov.mat <- cov(tail(head(morets,i+12),12)) opt.fun <- function(wgt.vect) -sum(Mo.vect %*% wgt.vect) / (t(wgt.vect) %*% (cov.mat %*% wgt.vect)) LowerBounds<-c(0.2,0.05,0.1,0,0,0) UpperBounds<-c(0.6,0.3,0.6,0.15,0.1,0.2) spgSolution <- spg(wgt.vect, fn=opt.fun, lower=LowerBounds, upper=UpperBounds, project="projectLinear", projectArgs=list(A=matrix(1, 1, length(wgt.vect)), b=1, meq=1))) Ravi ____ From: Ravi Varadhan Sent: Saturday, May 5, 2018 12:31 PM To: m.ash...@enduringinvestments.com; r-help@r-project.org Subject: adding overall constraint in optim() Hi, You can use the projectLinear argument in BB::spg to optimize with linear equality/inequality constraints. Here is how you implement the constraint that all parameters sum to 1. require(BB) spg(par=p0, fn=myFn, project="projectLinear", projectArgs=list(A=matrix(1, 1, length(p0)), b=1, meq=1)) Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding overall constraint in optim()
Hi, You can use the projectLinear argument in BB::spg to optimize with linear equality/inequality constraints. Here is how you implement the constraint that all parameters sum to 1. require(BB) spg(par=p0, fn=myFn, project="projectLinear", projectArgs=list(A=matrix(1, 1, length(p0)), b=1, meq=1)) Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subject: glm and stepAIC selects too many effects
More principled would be to use a lasso-type approach, which combines selection and estimation in one fell swoop! Ravi From: Ravi Varadhan Sent: Tuesday, June 6, 2017 10:16 AM To: r-help@r-project.org Subject: Subject: [R] glm and stepAIC selects too many effects If AIC is giving you a model that is too large, then use BIC (log(n) as the penalty for adding a term in the model). This will yield a more parsimonious model. Now, if you ask me which is the better option, I have to refer you to the huge literature on model selection. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subject: glm and stepAIC selects too many effects
If AIC is giving you a model that is too large, then use BIC (log(n) as the penalty for adding a term in the model). This will yield a more parsimonious model. Now, if you ask me which is the better option, I have to refer you to the huge literature on model selection. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Haplo.glm error: Failed to converge during EM-glm loop
It is a "warning" message, but not an "error" message. Therefore, you should still get parameter estimates. However, lack of convergence is often an important issue. Although in the case of the EM algorithm, it may not be so serious since EM is notoriously slow to converge when you set a small tolerance for convergence. You may fiddle with the control options for EM algorithm and see if you can get convergence. If not, you should contact the package authors. Hope this is helpful, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Regressions with non-negativity constraint
Hi, Take a look at the package "ic.infer" by Ulrike Gromping. https://www.jstatsoft.org/article/view/v033i10 Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in maximum likelihood
Standard error = sqrt(diag(solve(opt$hessian))) Ravi From: Alaa Sindi [mailto:alaasi...@icloud.com] Sent: Wednesday, March 02, 2016 3:22 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: help in maximum likelihood Thank you very much prof. Ravi, That was very helpful. Is there a way to get the t and p value for the coefficients? Thanks Alaa On Mar 2, 2016, at 10:05 AM, Ravi Varadhan mailto:ravi.varad...@jhu.edu>> wrote: There is nothing wrong with the optimization. It is a warning message. However, this is a good example to show that one should not simply dismiss a warning before understanding what it means. The MLE parameters are also large, indicating that there is something funky about the model or the data or both. In your case, there is one major problem with the data: for the highest dose (value of x), you have all subjects responding, i.e. y = n. Even for the next lower dose, there is almost complete response. Where do these data come from? Are they real or fake (simulated) data? Also, take a look at the eigenvalues of the hessian at the solution. You will see that there is some ill-conditioning, as the eigenvalues are widely separated. x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839) y <- c( 6, 13, 18, 28, 52, 53, 61, 60) n <- c(59, 60, 62, 56, 63, 59, 62, 60) # note: there is no need to have the choose(n, y) term in the likelihood fn <- function(p) sum( - (y*(p[1]+p[2]*x) - n*log(1+exp(p[1]+p[2]*x))) ) out <- nlm(fn, p = c(-50,20), hessian = TRUE) out eigen(out$hessian) Hope this is helpful, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in maximum likelihood
There is nothing wrong with the optimization. It is a warning message. However, this is a good example to show that one should not simply dismiss a warning before understanding what it means. The MLE parameters are also large, indicating that there is something funky about the model or the data or both. In your case, there is one major problem with the data: for the highest dose (value of x), you have all subjects responding, i.e. y = n. Even for the next lower dose, there is almost complete response. Where do these data come from? Are they real or fake (simulated) data? Also, take a look at the eigenvalues of the hessian at the solution. You will see that there is some ill-conditioning, as the eigenvalues are widely separated. x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839) y <- c( 6, 13, 18, 28, 52, 53, 61, 60) n <- c(59, 60, 62, 56, 63, 59, 62, 60) # note: there is no need to have the choose(n, y) term in the likelihood fn <- function(p) sum( - (y*(p[1]+p[2]*x) - n*log(1+exp(p[1]+p[2]*x))) ) out <- nlm(fn, p = c(-50,20), hessian = TRUE) out eigen(out$hessian) Hope this is helpful, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looking for a data set - Teratology data of Chen and Kodell (1989)
Hi, I am looking for the teratology data set from the Chen and Kodell (JASA 1989) on mice exposed to different doses of a chemical, DEHP. This was also analyzed in a Biometrics 2000 paper by Louise Ryan using GEE. I think this data set is publicly available, but I am unable to locate it. Does anyone know of this or worked with this data set? If so, can someone share it with me? Thank you very much. Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Solve an equation including integral
There was a mistake in the previously sent function. I had hard coded the values of parameters `nu' and `exi'. Use this one: myroot <- function(t, nu, exi, alpha){ fun <- function(t, exi, nu) (nu+t^2)^(-(nu+1)/2)*exp(((nu+1)*exi*t)/((nu+t^2)^0.5)) res <- alpha - integrate(fun, -Inf, t, nu=nu, exi=exi)$value return(res) } uniroot(myroot, c(-2, 2), nu=2, exi=0.5, alpha=.05) Ravi From: Ravi Varadhan Sent: Friday, January 08, 2016 11:29 AM To: r-help@r-project.org; 'sstol...@gmail.com' Subject: Re: Solve an equation including integral I think this is what you want: myroot <- function(t, nu, exi, alpha){ fun <- function(t, exi, nu) (nu+t^2)^(-(nu+1)/2)*exp(((nu+1)*exi*t)/((nu+t^2)^0.5)) res <- alpha - integrate(fun, -Inf, t, nu=2, exi=0.5)$value return(res) } uniroot(myroot, c(-2, 2), nu=2, exi=0.5, alpha=.05) Hope this is helpful, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Solve an equation including integral
I think this is what you want: myroot <- function(t, nu, exi, alpha){ fun <- function(t, exi, nu) (nu+t^2)^(-(nu+1)/2)*exp(((nu+1)*exi*t)/((nu+t^2)^0.5)) res <- alpha - integrate(fun, -Inf, t, nu=2, exi=0.5)$value return(res) } uniroot(myroot, c(-2, 2), nu=2, exi=0.5, alpha=.05) Hope this is helpful, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep/regexp
Hi, I would appreciate some help with using grep(). I have a bunch of variables in a data frame and I would like to select some of them using grep. Here is an example of what I am trying to do: vars <- c("Fr_I_total", "Fr_I_percent_of_CD4", "Ki.67_in_Fr_I_percent_of_Fr_I", "Fr_II_percent_of_CD4", "Ki.67_in_Fr_II_percent_of_Fr_II") >From the above vector, I would like to select those variables beginning with >`Fr' and containing `percent' in them. In other words, I would like to get >the variables "Fr_I_percent_of_CD4" and "Fr_II_percent_of_CD4". How can I use grep() to do this? More generally, are there any good online resources with examples like this for the use of grep() and regexp() in R? I didn't find the help pages for these very user-friendly. Thank you very much, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability
Hi John, My main point is not about Nelder-Mead per se. It is *primarily* about the Nelder-Mead implementation in optim(). The users of optim() should be cautioned regarding the default algorithm and that they should consider alternatives such as "BFGS" in optim(), or other implementations of Nelder-Mead. Best regards, Ravi From: ProfJCNash Sent: Sunday, November 15, 2015 12:21 PM To: Ravi Varadhan; 'r-help@r-project.org'; lorenzo.ise...@gmail.com Cc: b...@xs4all.nl; Gabor Grothendieck Subject: Re: Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is "bad" per se. It's issues are that it assumes the parameters are all on the same scale, and the termination (not convergence) test can't use gradients, so it tends to get "near" the optimum very quickly -- say only 10% of the computational effort -- then spends an awful amount of effort deciding it's got there. It often will do poorly when the function has nearly "flat" zones e.g., long valley with very low slope. So my message is still that Nelder-Mead is an unfortunate default -- it has been chosen I believe because it is generally robust and doesn't need gradients. BFGS really should use accurate gradients, preferably computed analytically, so it would only be a good default in that case or with very good approximate gradients (which are costly computationally). However, if you understand what NM is doing, and use it accordingly, it is a valuable tool. I generally use it as a first try BUT turn on the trace to watch what it is doing as a way to learn a bit about the function I am minimizing. Rarely would I use it as a production minimizer. Best, JN On 15-11-15 12:02 PM, Ravi Varadhan wrote: > Hi, > > > > While I agree with the comments about paying attention to parameter > scaling, a major issue here is that the default optimization algorithm, > Nelder-Mead, is not very good. It is unfortunate that the optim > implementation chose this as the "default" algorithm. I have several > instances where people have come to me with poor results from using > optim(), because they did not realize that the default algorithm is > bad. We (John Nash and I) have pointed this out before, but the R core > has not addressed this issue due to backward compatibility reasons. > > > > There is a better implementation of Nelder-Mead in the "dfoptim" package. > > > > require(dfoptim) > > mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data) > > mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data) > > mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data) > > print(mm_def1$par) > > print(mm_def2$par) > > print(mm_def3$par) > > > > In general, better implementations of optimization algorithms are > available in packages such as "optimx", "nloptr". It is unfortunate > that most naïve users of optimization in R do not recognize this. > Perhaps, there should be a "message" in the optim help file that points > this out to the users. > > > > Hope this is helpful, > > Ravi > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability
Hi, While I agree with the comments about paying attention to parameter scaling, a major issue here is that the default optimization algorithm, Nelder-Mead, is not very good. It is unfortunate that the optim implementation chose this as the "default" algorithm. I have several instances where people have come to me with poor results from using optim(), because they did not realize that the default algorithm is bad. We (John Nash and I) have pointed this out before, but the R core has not addressed this issue due to backward compatibility reasons. There is a better implementation of Nelder-Mead in the "dfoptim" package. ?require(dfoptim) mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data) mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data) mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data) print(mm_def1$par) print(mm_def2$par) print(mm_def3$par) In general, better implementations of optimization algorithms are available in packages such as "optimx", "nloptr". It is unfortunate that most na�ve users of optimization in R do not recognize this. Perhaps, there should be a "message" in the optim help file that points this out to the users. Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About error: L-BFGS-B needs finite values of 'fn'
It seems like there is substantial finite-sample bias in the MLEs. Either that or there is some error in your procedure. See attached code. Ravi From: Ravi Varadhan Sent: Wednesday, November 11, 2015 2:33 PM To: 'denizozo...@gazi.edu.tr' ; r-help@r-project.org Cc: 'profjcn...@gmail.com' Subject: Re: [R] About error: L-BFGS-B needs finite values of 'fn' With a small sample size, n=30, you will have realizations of data where you will run into difficulties with the MLE of generalized Gamma distribution. This is mainly due to the `k' parameter. Increase the sample size (e.g., n=50 or 100) and this problem is less likely to happen (but can still happen). I would strongly suggest that when you are doing simulations, you should encapsulate the parameter estimation inside a `try' or `tryCatch' statement so that when there is an error, the simulation keeps going rather than crashing out. See the attached code. Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About error: L-BFGS-B needs finite values of 'fn'
With a small sample size, n=30, you will have realizations of data where you will run into difficulties with the MLE of generalized Gamma distribution. This is mainly due to the `k' parameter. Increase the sample size (e.g., n=50 or 100) and this problem is less likely to happen (but can still happen). I would strongly suggest that when you are doing simulations, you should encapsulate the parameter estimation inside a `try' or `tryCatch' statement so that when there is an error, the simulation keeps going rather than crashing out. See the attached code. Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get variable name while doing series of regressions in an automated manner?
Thank you very much, Marc & Bert. Bert - I think you're correct. With Marc's solution, I am not able to get the response variable name in the call to lm(). But, your solution works well. Best regards, Ravi -Original Message- From: Bert Gunter [mailto:bgunter.4...@gmail.com] Sent: Tuesday, October 27, 2015 1:50 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] How to get variable name while doing series of regressions in an automated manner? Marc,Ravi: I may misunderstand, but I think Marc's solution labels the list components but not necessarily the summary() outputs. This might be sufficient, as in: > z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10) > > ##1 > results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z)) > lapply(results1,summary) $y1 Call: lm(formula = log(y) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.2185 -0.1259 -0.0643 0.1340 0.3988 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.693190.14375 11.779 2.47e-06 *** x -0.014950.02317 -0.6450.537 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2104 on 8 degrees of freedom Multiple R-squared: 0.04945,Adjusted R-squared: -0.06937 F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 $y2 Call: lm(formula = log(y) ~ x, data = z) Residuals: Min1QMedian3Q Max -0.229072 -0.094579 -0.006498 0.134303 0.188158 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** x -0.006226 0.016778 -0.371 0.72 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1524 on 8 degrees of freedom Multiple R-squared: 0.01692,Adjusted R-squared: -0.106 F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 ## 2 Alternatively, if you want output with the correct variable names, bquote() can be used, as in: > results2 <-lapply(names(z)[1:2], +function(nm){ + fo <-formula(paste0("log(",nm,")~x")) + eval(bquote(lm(.(u),data=z),list(u=fo))) +}) > lapply(results2,summary) [[1]] Call: lm(formula = log(y1) ~ x, data = z) Residuals: Min 1Q Median 3Q Max -0.2185 -0.1259 -0.0643 0.1340 0.3988 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.693190.14375 11.779 2.47e-06 *** x -0.014950.02317 -0.6450.537 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2104 on 8 degrees of freedom Multiple R-squared: 0.04945,Adjusted R-squared: -0.06937 F-statistic: 0.4161 on 1 and 8 DF, p-value: 0.5369 [[2]] Call: lm(formula = log(y2) ~ x, data = z) Residuals: Min1QMedian3Q Max -0.229072 -0.094579 -0.006498 0.134303 0.188158 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.084846 0.104108 20.026 4.03e-08 *** x -0.006226 0.016778 -0.371 0.72 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1524 on 8 degrees of freedom Multiple R-squared: 0.01692,Adjusted R-squared: -0.106 F-statistic: 0.1377 on 1 and 8 DF, p-value: 0.7202 HTH or apologies if I've missed the point and broadcasted noise. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan wrote: > Hi, > > I am running through a series of regression in a loop as follows: > > results <- vector("list", length(mydata$varnames)) > > for (i in 1:length(mydata$varnames)) { results[[i]] <- > summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + > CMV.status, data=mydata)) } > > Now, when I look at the results[i]] objects, I won't be able to see the > original variable names. Obviously, I will only see the following: > > Call: > lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, > data = mydata) > > > Is there a way to display the original variable names on the LHS? In > addition, is there a better paradigm for doing these type of series of > regressions in an automatic fashion? > > Thank you very much, > Ravi > > Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) > Associate Professor, Department of Oncology Division of Biostatistics > & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns > Hopkins University > 550 N. Broadway, Suite -E > Baltimore, MD 21205 > 410-502-2619 > > > [[alternative HTML version deleted]] > > __ > R-help
[R] How to get variable name while doing series of regressions in an automated manner?
Hi, I am running through a series of regression in a loop as follows: results <- vector("list", length(mydata$varnames)) for (i in 1:length(mydata$varnames)) { results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata)) } Now, when I look at the results[i]] objects, I won't be able to see the original variable names. Obviously, I will only see the following: Call: lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status, data = mydata) Is there a way to display the original variable names on the LHS? In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion? Thank you very much, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear regression with a rounded response variable
Dear Peter, Charles, Gabor, Jim, Mark, Victor, Peter, and Harold, You have given me plenty of ammunition. Thank you very much for the useful answers. Gratefully, Ravi From: peter dalgaard Sent: Wednesday, October 21, 2015 8:11 PM To: Charles C. Berry Cc: Ravi Varadhan; r-help@r-project.org Subject: Re: [R] Linear regression with a rounded response variable > On 21 Oct 2015, at 19:57 , Charles C. Berry wrote: > > On Wed, 21 Oct 2015, Ravi Varadhan wrote: > >> [snippage] > > If half the subjects have a value of 5 seconds and the rest are split between > 4 and 6, your assertion that rounding induces an error of > dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6 second > group and more negative errors in the 4 second group under any plausible > model). Yes, and I think that the suggestion in another post to look at censored regression is more in the right direction. In general, I'd expect the bias caused by rounding the response to quite small, except at very high granularity. I did a few small experiments with the simplest possible linear model: estimating a mean based on highly rounded data, > y <- round(rnorm(1e2,pi,.5)) > mean(y) [1] 3.12 > table(y) y 2 3 4 5 13 63 23 1 Or, using a bigger sample: > mean(round(rnorm(1e8,pi,.5))) [1] 3.139843 in which there is a visible bias, but quite a small one: > pi - 3.139843 [1] 0.001749654 At lower granularity (sd=1 instead of .5), the bias has almost disappeared. > mean(round(rnorm(1e8,pi,1))) [1] 3.141577 If the granularity is increased sufficiently, you _will_ see a sizeable bias (because almost all observations will be round(pi)==3): > mean(round(rnorm(1e8,pi,.1))) [1] 3.00017 A full ML fit (with known sigma=1) is pretty easily done: > library(stats4) > mll <- function(mu)-sum(log(pnorm(y+.5,mu, .5)-pnorm(y-.5, mu, .5))) > mle(mll,start=list(mu=3)) Call: mle(minuslogl = mll, start = list(mu = 3)) Coefficients: mu 3.122069 > mean(y) [1] 3.12 As you see, the difference is only 0.002. A small simulation (1000 repl.) gave (r[1,]==MLE ; r{2,]==mean) > summary(r[1,]-r[2,]) Min. 1st Qu.Median Mean 3rd Qu. Max. -0.004155 0.000702 0.001495 0.001671 0.002554 0.006860 so the corrections relative to the crude mean stay within one unit in the 2nd place. Notice that the corrections are pretty darn close to cancelling out the bias. -pd > > > HTH, > > Chuck > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear regression with a rounded response variable
Hi, I am dealing with a regression problem where the response variable, time (second) to walk 15 ft, is rounded to the nearest integer. I do not care for the regression coefficients per se, but my main interest is in getting the prediction equation for walking speed, given the predictors (age, height, sex, etc.), where the predictions will be real numbers, and not integers. The hope is that these predictions should provide unbiased estimates of the "unrounded" walking speed. These sounds like a measurement error problem, where the measurement error is due to rounding and hence would be uniformly distributed (-0.5, 0.5). Are there any canonical approaches for handling this type of a problem? What is wrong with just doing the standard linear regression? I googled and saw that this question was asked by someone else in a stackexchange post, but it was unanswered. Any suggestions? Thank you, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in auglag?
Dear Rainer, This is NOT a bug in auglag. I already mentioned that auglag() can work with infeasible starting values, which also implies that the function must be evaluable at infeasible values. A simple solution to your problem would be to fix up your objective function such that it evaluates to `Inf' or some large value, when the parameter values are not in the constrained domain. constrOptim.nl() is a barrier method so it forces the initial value and the subsequent iterates to be feasible. Best, Ravi From: Rainer M Krug Sent: Tuesday, October 6, 2015 9:20 AM To: Ravi Varadhan Cc: 'r-help@r-project.org' Subject: Bug in auglag? Hi Ravi, I would like come back to your offer. I have a problem which possibly is caused by a bug or by something I don't understand: My function to be minimised is executed even when an element in hin() is negative. My hin looks as follow: --8<---cut here---start->8--- hinMahat <- function(x, hauteur, na, zjoint, y, LAI, ...) { if (x[1] < 0) { cat(names(list(...)), "\n") cat(..., "\n") cat(x, "|", hauteur, LAI, y, "\n") } h <- rep(NA, 8) if (!missing(na)) { x <- c(na, x ) } if (!missing(y)) { x <- c(x, y) } if (!missing(zjoint)) { x <- c(x[1], zjoint, x[2]) } ## dep <- hauteur * (0.05 + LAI^0.02 / 2) + (x[3] - 1)/20 h[1] <- dep h[2] <- hauteur - dep ## if (h[2]==0) { ## h[2] <- -1 ## } ## z0 <- hauteur * (0.23 + LAI^0.25 / 10) + (x[3] - 1)/67 h[3] <- z0 ## if (h[3]==0) { ## h[3] <- -1 ## } h[4] <- hauteur - z0 ## h[5] <- x[1] ## h[6] <- x[2] h[7] <- hauteur - x[2] ## h[8] <- hauteur - dep - z0 if (any(h<=0)) { cat(h, "\n") cat("\n") } return(h) } --8<---cut here---end--->8--- the x contains up to three elements: c(na=, zjoint=, y=) and I fit these three, unless one or two are specified explicitely. The values going into hin are: , | ... (z u ua za z0sol ) | 3 11 17 23 29 37 0.315 0.422 0.458 0.556 1.567 1.747 1.747 37 0.001 | | x(na, zjoint): -8.875735 24.51316 | hauteur: 28 | na: 8.1 | y: 3 | | the resulting hin() is: | 16.09815 11.90185 11.19352 16.80648 -8.875735 24.51316 3.486843 0.708335 ` Which is negative in element 5 as x[2]=na is negative. So I would expect that the function fn is not evaluated. But it is, and raises an error: , | Error in wpLELMahat(z = z, ua = ua, na = ifelse(missing(na), par[1], na), : | na has to be larger or equal than zero! ` Is this a misunderstanding on my part, or is it an error in the function auglag? Below is the function which is doing the minimisation. If I replace auglag() with constrOptim.nl(), the optimisation is working as expected. So I think this is a bug in auglag? Let me know if you need further information. Cheers, Rainer --8<---cut here---start->8--- fitAuglag.wpLEL.mahat.single <- function( z, u, LAI, initial = c(na=9, zjoint=0.2*2, y=3), na, zjoint, y, h = 28, za = 37, z0sol = 0.001, hin, ... ) { if (missing(hin)) { hin <- hinMahat } wpLELMin <- function(par, na, zjoint, y, z, u, ua, hauteur, za, z0sol, LAI) { result <- NA try({ p <- wpLELMahat( z = z, ua = ua, na = ifelse(missing(na), par[1], na), zjoint = ifelse(missing(zjoint), par[2], zjoint), h = hauteur, za = za, z0sol = z0sol, LAI= LAI, y = ifelse(missing(y), par[3], y) ) result <- sum( ( (p$u - u)^2 ) / length(u) ) }, silent = FALSE ) ## cat("From wpLELMin", par, "\n") return( result ) } ua <- u[length(u)] result <- list() result$method <- "fitAuglag.wpLEL.mahat.single" result$initial <- initial result$dot <- list(...) result$z <- z result$u <- u result$fit <- auglag( par = initial, fn= wpLELMin, hin = hin, na = na,
Re: [R] optimizing with non-linear constraints
I would recommend that you use auglag() rather than constrOptim.nl() in the package "alabama." It is a better algorithm, and it does not require feasible starting values. Best, Ravi -Original Message- From: Rainer M Krug [mailto:rai...@krugs.de] Sent: Thursday, October 01, 2015 3:37 AM To: Ravi Varadhan Cc: 'r-help@r-project.org' Subject: Re: optimizing with non-linear constraints Ravi Varadhan writes: > Hi Rainer, > It is very simple to specify the constraints (linear or nonlinear) in > "alabama" . They are specified in a function called `hin', where the > constraints are written such that they are positive. OK - I somehow missed the part that, when the values x are valid, i.e. in the range as defined by the conditions, the result of hin(x) that they are all positive. > Your two nonlinear constraints would be written as follows: > > hin <- function(x, LAI) { > h <- rep(NA, 2) > h[1] <- LAI^x[2] / x[3] + x[1] > h[2] <- 1 - x[1] - LAI^x[2] / x[3] > h > } Makes perfect sense. > > Please take a look at the help page. If it is still not clear, you can > contact me offline. Yup - I did. But I somehow missed the fact stated above. I am using constrOptim() and constrOptim.nl() for a paper and am compiling a separate document which explains how to get the constraints for the two functions step by step - I will make it available as a blog post and a pdf. I might have further questions concerning the different fitting functions and which ones are the most appropriate in my case. Thanks a lot, Rainer > Best, > Ravi > > Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) > Associate Professor, Department of Oncology Division of Biostatistics > & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns > Hopkins University > 550 N. Broadway, Suite -E > Baltimore, MD 21205 > 410-502-2619 > > > [[alternative HTML version deleted]] > -- Rainer M. Krug email: Rainerkrugsde PGP: 0x0F52F982 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimizing with non-linear constraints
Hi Rainer, It is very simple to specify the constraints (linear or nonlinear) in "alabama" . They are specified in a function called `hin', where the constraints are written such that they are positive. Your two nonlinear constraints would be written as follows: hin <- function(x, LAI) { h <- rep(NA, 2) h[1] <- LAI^x[2] / x[3] + x[1] h[2] <- 1 - x[1] - LAI^x[2] / x[3] h } Please take a look at the help page. If it is still not clear, you can contact me offline. Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor, Department of Oncology Division of Biostatistics & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 550 N. Broadway, Suite -E Baltimore, MD 21205 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lsqlin in pracma
In solve.QP(), you don't need to expand the equality into two inequalities. It can DIRECTLY handle the equality constraints. The first `meq' rows of the constraint matrix are equality constraints. Here is the excerpt from the documentation. meq the first meq constraints are treated as equality constraints, all further as inequality constraints (defaults to 0). Therefore, solve.QP() can provide the full functionality of lsqlin in Matlab. However, one caveat is that the bounds constraints have to be implemented via inequalities in solve.QP(), which is a slight pain, but not a deal breaker. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R
What about numeric constants, like `163'? eval(Pre(exp(sqrt(163)*pi), 120))does not work. Thanks, Ravi From: David Winsemius Sent: Saturday, July 4, 2015 1:12 PM To: Duncan Murdoch Cc: r-help; John Nash; Ravi Varadhan Subject: Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R > On Jul 4, 2015, at 12:20 AM, Duncan Murdoch wrote: > > On 04/07/2015 8:21 AM, David Winsemius wrote: >> >>> On Jul 3, 2015, at 11:05 PM, Duncan Murdoch >>> wrote: >>> >>> On 04/07/2015 3:45 AM, David Winsemius wrote: >>>> >>>>> On Jul 3, 2015, at 5:08 PM, David Winsemius >>>>> wrote: >>>>> >>>>> >>>>> >>>>> It doesn’t appear to me that mpfr was ever designed to handle expressions >>>>> as the first argument. >>>> >>>> This could be a start. Obviously one would wnat to include code to do >>>> other substitutions probably using the all.vars function to pull out the >>>> other “constants” and ’numeric’ values to make them of equivalent >>>> precision. I’m guessing you want to follow the parse-tree and then testing >>>> the numbers for integer-ness and then replacing by paste0( “mpfr(“, val, >>>> “L, “, prec,”)” ) >>>> >>>> Pre <- function(expr, prec){ sexpr <- deparse(substitute(expr) ) >>> >>> Why deparse? That's almost never a good idea. I can't try your code (I >>> don't have mpfr available), but it would be much better to modify the >>> expression than the text representation of it. For example, I think >>> your code would modify strings containing "pi", or variables with those >>> letters in them, etc. If you used substitute(expr) without the >>> deparse(), you could replace the symbol "pi" with the call to the Const >>> function, and be more robust. >>> >> >> Really? I did try. I was fairly sure that someone could do better but I >> don’t see an open path along the lines you suggest. I’m pretty sure I tried >> `substitute(expr, list(pi= pi))` when `expr` had been the formal argument >> and got disappointed because there is no `pi` in the expression `expr`. I >> _thought_ the problem was that `substitute` does not evaluate its first >> argument, but I do admit to be pretty much of a klutz with this sort of >> programming. I don’t think you need to have mpfr installed in order to >> demonstrate this. > > The substitute() function really does two different things. > substitute(expr) (with no second argument) grabs the underlying > expression out of a promise. substitute(expr, list(pi = pi)) tries to > make the substitution in the expression "expr", so it doesn't see "pi”. Thank you. That was really helpful. I hope it “sticks” to sufficiently durable set of neurons. > > This should work: > > do.call(substitute, list(expr = substitute(expr), env=list(pi = > Const(pi, 120 > > (but I can't evaluate the Const function to test it). The expression `pi` as the argument to Const only needed to be character()-ized and then evaluation on that result succeeded: library(mpfr) # Pre <- function(expr, prec){ do.call(substitute, list(expr = substitute(expr), env=list(pi = Const(pi, prec } > Pre(exp(pi),120) Error in Const(pi, prec) : 'name' must be one of 'pi', 'gamma', 'catalan', 'log2' Pre <- function(expr, prec){ do.call(substitute, list(expr = substitute(expr), env=list(pi = Const('pi', prec } > Pre(exp(pi),120) exp(list()) > eval( Pre(exp(pi),120) ) 1 'mpfr' number of precision 120 bits [1] 23.140692632779269005729086367948547394 So the next step for delivering Ravi’s request would be to add the rest of the ‘Const’ options: Pre <- function(expr, prec){ do.call(substitute, list(expr = substitute(expr), env=list(pi = Const('pi', prec), catalan=Const('catalan', prec), log2=Const('log2', prec), gamma=Const('gamma', prec } > eval(Pre(exp(gamma))) Error in stopifnot(is.numeric(prec)) : argument "prec" is missing, with no default > eval(Pre(exp(gamma), 120)) 1 'mpfr' number of precision 120 bits [1] 1.781072417990197985236504103107179549 > > This works: > > do.call(substitute, list(expr = substitute(expr), env = list(pi = > quote(Const(pi, 120) > > because it never evaluates the Const function, it just returns some code > that you could modify more if you liked. > > Duncan Murdoch David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R
Thank you all. I did think about declaring `pi' as a special constant, but for some reason didn't actually try it. Would it be easy to have the mpfr() written such that its argument is automatically of extended precision? In other words, if I just called: mpfr(exp(sqrt(163)*pi, 120), then all the constants, 163, pi, are automatically of 120 bits precision. Is this easy to do? Best, Ravi From: David Winsemius Sent: Friday, July 3, 2015 2:06 PM To: John Nash Cc: r-help; Ravi Varadhan Subject: Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R > On Jul 3, 2015, at 8:08 AM, John Nash wrote: > > > > > Third try -- I unsubscribed and re-subscribed. Sorry to Ravi for extra > traffic. > > In case anyone gets a duplicate, R-help bounced my msg "from non-member" (I > changed server, but not email yesterday, > so possibly something changed enough). Trying again. > > JN > > I got the Wolfram answer as follows: > > library(Rmpfr) > n163 <- mpfr(163, 500) > n163 > pi500 <- mpfr(pi, 500) > pi500 > pitan <- mpfr(4, 500)*atan(mpfr(1,500)) > pitan > pitan-pi500 > r500 <- exp(sqrt(n163)*pitan) > r500 > check <- "262537412640768743.25007259719818568887935385..." > savehistory("jnramanujan.R") > > Note that I used 4*atan(1) to get pi. RK got it right by following the example in the help page for mpfr: Const("pi", 120) The R `pi` constant is not recognized by mpfr as being anything other than another double . There are four special values that mpfr recognizes. — Best; David > It seems that may be important, > rather than converting. > > JN > > On 15-07-03 06:00 AM, r-help-requ...@r-project.org wrote: > >> Message: 40 >> Date: Thu, 2 Jul 2015 22:38:45 + >> From: Ravi Varadhan >> To: "'Richard M. Heiberger'" , Aditya Singh >> >> Cc: r-help >> Subject: Re: [R] : Ramanujan and the accuracy of floating point >> computations - using Rmpfr in R >> Message-ID: >> <14ad39aaf6a542849bbf3f62a0c2f...@dom-eb1-2013.win.ad.jhu.edu> >> Content-Type: text/plain; charset="utf-8" >> >> Hi Rich, >> >> The Wolfram answer is correct. >> http://mathworld.wolfram.com/RamanujanConstant.html >> >> There is no code for Wolfram alpha. You just go to their web engine and >> plug in the expression and it will give you the answer. >> http://www.wolframalpha.com/ >> >> I am not sure that the precedence matters in Rmpfr. Even if it does, the >> answer you get is still wrong as you showed. >> >> Thanks, >> Ravi >> >> -Original Message- >> From: Richard M. Heiberger [mailto:r...@temple.edu] >> Sent: Thursday, July 02, 2015 6:30 PM >> To: Aditya Singh >> Cc: Ravi Varadhan; r-help >> Subject: Re: [R] : Ramanujan and the accuracy of floating point computations >> - using Rmpfr in R >> >> There is a precedence error in your R attempt. You need to convert >> 163 to 120 bits first, before taking >> its square root. >> >>>> exp(sqrt(mpfr(163, 120)) * mpfr(pi, 120)) >> 1 'mpfr' number of precision 120 bits >> [1] 262537412640768333.51635812597335712954 >> >> ## just the last four characters to the left of the decimal point. >>>> tmp <- c(baseR=8256, Wolfram=8744, Rmpfr=8333, wrongRmpfr=7837) >>>> tmp-tmp[2] >> baseRWolfram Rmpfr wrongRmpfr >> -488 0 -411 -907 >> You didn't give the Wolfram alpha code you used. There is no way of >> verifying the correct value from your email. >> Please check that you didn't have a similar precedence error in that code. >> >> Rich >> >> >> On Thu, Jul 2, 2015 at 2:02 PM, Aditya Singh via R-help >> wrote: >>>> Ravi >>>> >>>> I am a chemical engineer by training. Is there not something like law of >>>> corresponding states in numerical analysis? >>>> >>>> Aditya >>>> >>>> >>>> >>>> -- >>>> On Thu 2 Jul, 2015 7:28 AM PDT Ravi Varadhan wrote: >>>> >>>>>> Hi, >>>>>> >>>>>> Ramanujan supposedly discovered that the number, 163, has this >>>>>> interesting property that exp(sqrt(163)*pi), which is obviously a >>>>>> transcendental number, is real close to an integer (close to 10^(-12)). >>>>
Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R
Hi Rich, The Wolfram answer is correct. http://mathworld.wolfram.com/RamanujanConstant.html There is no code for Wolfram alpha. You just go to their web engine and plug in the expression and it will give you the answer. http://www.wolframalpha.com/ I am not sure that the precedence matters in Rmpfr. Even if it does, the answer you get is still wrong as you showed. Thanks, Ravi -Original Message- From: Richard M. Heiberger [mailto:r...@temple.edu] Sent: Thursday, July 02, 2015 6:30 PM To: Aditya Singh Cc: Ravi Varadhan; r-help Subject: Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R There is a precedence error in your R attempt. You need to convert 163 to 120 bits first, before taking its square root. > exp(sqrt(mpfr(163, 120)) * mpfr(pi, 120)) 1 'mpfr' number of precision 120 bits [1] 262537412640768333.51635812597335712954 ## just the last four characters to the left of the decimal point. > tmp <- c(baseR=8256, Wolfram=8744, Rmpfr=8333, wrongRmpfr=7837) > tmp-tmp[2] baseRWolfram Rmpfr wrongRmpfr -488 0 -411 -907 > You didn't give the Wolfram alpha code you used. There is no way of verifying the correct value from your email. Please check that you didn't have a similar precedence error in that code. Rich On Thu, Jul 2, 2015 at 2:02 PM, Aditya Singh via R-help wrote: > > Ravi > > I am a chemical engineer by training. Is there not something like law of > corresponding states in numerical analysis? > > Aditya > > > > -- > On Thu 2 Jul, 2015 7:28 AM PDT Ravi Varadhan wrote: > >>Hi, >> >>Ramanujan supposedly discovered that the number, 163, has this interesting >>property that exp(sqrt(163)*pi), which is obviously a transcendental number, >>is real close to an integer (close to 10^(-12)). >> >>If I compute this using the Wolfram alpha engine, I get: >>262537412640768743.25007259719818568887935385... >> >>When I do this in R 3.1.1 (64-bit windows), I get: >>262537412640768256. >> >>The absolute error between the exact and R's value is 488, with a relative >>error of about 1.9x10^(-15). >> >>In order to replicate Wolfram Alpha, I tried doing this in "Rmfpr" but I am >>unable to get accurate results: >> >>library(Rmpfr) >> >> >>> exp(sqrt(163) * mpfr(pi, 120)) >> >>1 'mpfr' number of precision 120 bits >> >>[1] 262537412640767837.08771354274620169031 >> >>The above answer is not only inaccurate, but it is actually worse than the >>answer using the usual double precision. Any thoughts as to what I am doing >>wrong? >> >>Thank you, >>Ravi >> >> >> >> [[alternative HTML version deleted]] >> >>__ >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ramanujan and the accuracy of floating point computations - using Rmpfr in R
Hi, Ramanujan supposedly discovered that the number, 163, has this interesting property that exp(sqrt(163)*pi), which is obviously a transcendental number, is real close to an integer (close to 10^(-12)). If I compute this using the Wolfram alpha engine, I get: 262537412640768743.25007259719818568887935385... When I do this in R 3.1.1 (64-bit windows), I get: 262537412640768256. The absolute error between the exact and R's value is 488, with a relative error of about 1.9x10^(-15). In order to replicate Wolfram Alpha, I tried doing this in "Rmfpr" but I am unable to get accurate results: library(Rmpfr) > exp(sqrt(163) * mpfr(pi, 120)) 1 'mpfr' number of precision 120 bits [1] 262537412640767837.08771354274620169031 The above answer is not only inaccurate, but it is actually worse than the answer using the usual double precision. Any thoughts as to what I am doing wrong? Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple parameter optimization with optim()
Harold, Obviously the bottleneck is your objective function fn(). I have speeded up your function by a factor of about 2.4 by using `outer' instead of sapply. I think it can be speeded much more. I couldn't figure it out without spending a lot of time. I am sure someone on this list-serv can come up with a cleverer way to program the objective function. pl3 <- function(dat, Q, startVal = NULL, ...){ if(!is.null(startVal) && length(startVal) != ncol(dat) ){ stop("Length of argument startVal not equal to the number of parameters estimated") } if(!is.null(startVal)){ startVal <- startVal } else { p <- colMeans(dat) startValA <- rep(1, ncol(dat)) startValB <- as.vector(log((1 - p)/p)) startVal <- c(startValA,startValB) } rr1 <- rr2 <- numeric(nrow(dat)) qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) nds <- qq$nodes wts <- qq$weights Q <- length(qq$nodes) dat <- as.matrix(dat) fn <- function(params){ a <- params[1:20] b <- params[21:40] for(j in 1:length(rr1)){ rr1[j] <- sum(wts*exp(colSums(outer(dat[j,], nds, function(x,y) dbinom(x, 1, 1/ (1 + exp(- 1.7 * a * (y - b))), log = TRUE) } -sum(log(rr1)) } #opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE) opt <- nlminb(startVal, fn, control=list(trace=1, rel.tol=1.e-06, iter.max=50)) # opt <- spg(startVal, fn) opt #list("coefficients" = opt$par, "LogLik" = -opt$value, "Std.Error" = sqrt(diag(solve(opt$hessian } Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to use `eval(parse(text))' in nlme::lme
Yes, this is a very important point. Thank you, Bill. Best, Ravi From: William Dunlap Sent: Friday, February 13, 2015 4:56 PM To: Ravi Varadhan Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] Unable to use `eval(parse(text))' in nlme::lme > ff <- reformulate(termlabels=c("time","as.factor(gvhd)"), response=yname, > intercept=TRUE) If the right hand side of the formula were more complicated than an additive list of terms, say '~ time * as.factor(gvhd)' or the left side were more than a name, say 'log(yname)' you could use bquote() instead of reformulate. E.g., > formulaTemplate <- log(.(responseName)) ~ time * as.factor(gvhd) > lapply(c("y1","y2","y3"), function(yname)do.call(bquote, list(formulaTemplate, list(responseName=as.name<http://as.name>(yname) [[1]] log(y1) ~ time * as.factor(gvhd) [[2]] log(y2) ~ time * as.factor(gvhd) [[3]] log(y3) ~ time * as.factor(gvhd) I used 'do.call' because bquote does not evaluate its first argument, but we need to evaluate the name 'formulaTemplate'. You could avoid that by putting the template verbatim in the call to bquote, as in lapply(c("y1","y2","y3"), function(yname)bquote(log(.(responseName)) ~ time * as.factor(gvhd), list(responseName=as.name<http://as.name>(yname I like the do.call method because I can bury it in a function and forget about it. bquote() retains the environment of the formula template. Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Mon, Feb 9, 2015 at 6:44 AM, Ravi Varadhan mailto:ravi.varad...@jhu.edu>> wrote: Thanks to Rolf, Duncan, and Ben. Ben, your suggestion worked (with a minor correction of concatenating the termlabels into a vector). Here is the solution to those interested. ff <- reformulate(termlabels=c("time","as.factor(gvhd)"), response=yname, intercept=TRUE) dd <- subset(labdata2, Transplant_type!=0 & time >0) lme(ff, random=~1|Patient, data=dd, correlation=corAR1(), na.action=na.omit) Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor Department of Oncology Division of Biostatistics & Bionformatics Johns Hopkins University 550 N. Broadway Baltimore, MD 21205 40-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to use `eval(parse(text))' in nlme::lme
Thanks to Rolf, Duncan, and Ben. Ben, your suggestion worked (with a minor correction of concatenating the termlabels into a vector). Here is the solution to those interested. ff <- reformulate(termlabels=c("time","as.factor(gvhd)"), response=yname, intercept=TRUE) dd <- subset(labdata2, Transplant_type!=0 & time >0) lme(ff, random=~1|Patient, data=dd, correlation=corAR1(), na.action=na.omit) Best, Ravi Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) Associate Professor Department of Oncology Division of Biostatistics & Bionformatics Johns Hopkins University 550 N. Broadway Baltimore, MD 21205 40-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to use `eval(parse(text))' in nlme::lme
Hi, I would like to run lme() on a number of response variables in a dataframe in an automatic manner. Bu, when I use eval(parse(text=yname)) to denote the LHS of the formula in lme(), I get the following error message: > require(nlme) > mod2 <- with(subset(labdata2, Transplant_type!=0 & time >0), > lme(eval(parse(text=yname)) ~ time + as.factor(gvhd), random = ~1|Patient, > correlation = corAR1(), method="ML", na.action=na.omit)) Error in model.frame.default(formula = ~Patient + yname + time + gvhd, : variable lengths differ (found for 'yname') The same usage works well in lme4::lmer without any problems. It seems that there is a problem in how the formula object is evaluated in lme(). Is there an alternative way to do this? Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re-order levels of a categorical (factor) variable
Dear Bert, Martin, Bill, Thank you very much for the help. Bert & Martin's solutions solved my problem. Would it be useful to add an example like this to the help page on ?factor or ?levels? Thanks again, Ravi From: Martin Maechler Sent: Thursday, January 22, 2015 10:29 AM To: Bert Gunter Cc: William Dunlap; r-help@r-project.org; Ravi Varadhan Subject: Re: [R] Re-order levels of a categorical (factor) variable >>>>> Bert Gunter >>>>> on Wed, 21 Jan 2015 18:52:12 -0800 writes: > Bill/Ravi: > I believe the problem is that the factor is automatically created when > a data frame is created by read.table(). By default, the levels are > lexicographically ordered. The following reproduces the problem and > gives a solution. >> library(lattice) >> z <- data.frame(y = 1:9, x = rep(c("pre", "day2","day10"))) >> xyplot(y~x,data=z) ## x axis order is day 10, day2, pre >> levels(z$x) > [1] "day10" "day2" "pre" >> z$x <- factor(as.character(z$x),levels=c(levels(z$x)[3:1])) ## explicitly defines level order >> xyplot(y~x,data=z) ## desired plot Indeed, thank you, Bert, and using levels(.) <- * does *not* work... (as I first thought). However, slightly shorter and easier and maybe even easier to remember than z$x <- factor(as.character(z$x), levels = c(levels(z$x)[3:1])) ## def. level order is z$x <- factor(z$x, levels = levels(z$x)[3:1]) ## def. level order Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re-order levels of a categorical (factor) variable
Hi, I have a fairly elementary problem that I am unable to figure out. I have a continuous variable, Y, repeatedly measured at multiple times, T. The variable T is however is coded as a factor variable having these levels: c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365"). When I plot the boxplot, e.g., boxplot(Y ~ T), it displays the boxes in this order: c("Day 180", "Day 30", "Day 365", "Day 60", "Presurgery"). Is there a way to control the order of the boxes such that they are plotted in a particular order that I want, for example: c("Presurgery", "Day 30", "Day 60", "Day 180", "Day 365")? More generally, is there a simple way to redefine the ordering of the categorical variable such that this ordering will be used in whatever operation is done? I looked at relevel, reorder, etc., but they did not seem to be applicable to my problem. Thanks for any help. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 3.1.2 mle2() function on Windows 7 Error and multiple solutions
A more important point that want to make is that I find few people taking advantage of the "comparative evaluation" or benchmarking ability of optimx. There is no "uniformly best" optimizer for all problems. Different ones turn out to perform better for different problems and it is quite difficult to know a priori which one would be the best for "my" problem. Thus, the benchmarking capability provided by optimx is a powerful feature. Ravi From: Ben Bolker Sent: Thursday, January 15, 2015 9:29 AM To: Ravi Varadhan; R-Help Cc: malqura...@ksu.edu.sa Subject: Re: R 3.1.2 mle2() function on Windows 7 Error and multiple solutions -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 For what it's worth, you can use either nlminb (directly) or optimx within the mle2 wrapper by specifying the 'optimizer' parameter ... this gives you flexibility in optimization along with the convenience of mle2 (likelihood ratio tests via anova(), likelihood profiling, etc.) On 15-01-15 09:26 AM, Ravi Varadhan wrote: > Hi, > > > > I tried your problem with optimx package. I found a better > solution than that found by mle2. > > > > ?library(optimx) > > > > # the objective function needs to be re-written > > LL2 <- function(par,y) { > > lambda <- par[1] alpha <- par[2] beta <- par[3] R = > Nweibull(y,lambda,alpha,beta) > > -sum(log(R)) } > > > > optimx(fn=LL2, par=c(.01,325,.8),y=y, > lower=c(.1,.1,.1),upper = c(Inf, > Inf,Inf),control=list(all.methods=TRUE)) > > > > # Look at the solution found by `nlminb' and `nmkb'. This is the > optimal one. This log-likelihood is larger than that of mle2 and > other optimizers in optimx. > > > > If this solution is not what you are looking for, your problem may > be poorly scaled. First, make sure that the likelihood is coded > correctly. If it is correct, then you may need to improve the > scaling of the problem. > > > > > > Hope this is helpful, > > Ravi > > > -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJUt87VAAoJEOCV5YRblxUH9E4H/ismNjBi/diA7db1f4EtIYUz fk0V1GIjAkhNr+gxs8bu6CBAMB2f/ufw+9ey2X6yHlzvgfwzIwNafgg9c5qVlArF xD8A4w/4G9cRsQFX8yySEQMP7dH5tyCTeRHU0sEcTbY+vV/NtWAYpF7k36He0QnQ Jz/Gfmjt/TTVlcsL4crr8IdOjP34mq7H1SGXKNoBymhaggkBXXjG+IlhPK3/HE4s 2LFKusdSVDiJCCR+kafwyKxk76Lf2WADw9/RaysWfW0/v5O5dWU4IuvK2//nzvts 7rKMkF9/zlT+LgLNo7LON+RTOeDtTMqyA10Vu+txQTKH4AcMP4LqYoiGMerl6O0= =cHEo -END PGP SIGNATURE- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 3.1.2 mle2() function on Windows 7 Error and multiple solutions
Hi, I tried your problem with optimx package. I found a better solution than that found by mle2. ?library(optimx) # the objective function needs to be re-written LL2 <- function(par,y) { lambda <- par[1] alpha <- par[2] beta <- par[3] R = Nweibull(y,lambda,alpha,beta) -sum(log(R)) } optimx(fn=LL2, par=c(.01,325,.8),y=y, lower=c(.1,.1,.1),upper = c(Inf, Inf,Inf),control=list(all.methods=TRUE)) # Look at the solution found by `nlminb' and `nmkb'. This is the optimal one. This log-likelihood is larger than that of mle2 and other optimizers in optimx. If this solution is not what you are looking for, your problem may be poorly scaled. First, make sure that the likelihood is coded correctly. If it is correct, then you may need to improve the scaling of the problem. Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subject: short-sale constraint with nonpositive-definite matrix in portfolio optimization
Hi, You can try a projected gradient approach, which is implemented in the spg() function in the "BB" package. You have to provide a projection function which will take an infeasible matrix as input and will give a feasible (i.e. positive-definite) matrix as output. This kind of matrix projection is quite easy to do, and in fact, there are functions in R to do this (e.g., see posdefify() function in "sfsmisc" package). There is a recent review article on spectral projected gradient algorithm in J Statistical Software. http://www.jstatsoft.org/v60/i03 Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Checking modeling assumptions in a binomial GLMM
Dear All, I am fitting a model for a binary response variable measured repeatedly at multiple visits. I am using the binomial GLMM using the glmer() function in lme4 package. How can I evaluate the model assumptions (e.g., residual diagnostics, adequacy of random effects distribution) for a binomial GLMM? Are there any standard checks that are commonly done? Are there any pedagogical examples or data sets where model assumptions have been examined for binomial GLMMs? Any suggestions/guidance is appreciated. Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box-cox transformation
Thank you. It is very helpful. Ravi -Original Message- From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] Sent: Monday, July 07, 2014 4:15 PM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] Box-cox transformation Dear Ravi, In my previous example, I used the residuals, so: sum [ (r_i / scaling)^2 ] If you want to use the deviance from glm, that gives you: sum [ r_i^2 ] and since the scaling factor is just a constant for any given lambda, then the modification would be: sum [ r_i^2 ] / ( scaling^2 ) and is given in the modified code below (posted back to R-help in case any else has this question). Hope this helps, Josh ## require(MASS) myp <- function(y, lambda) (y^lambda-1)/lambda lambda <- seq(-0.05, 0.45, len = 20) N <- nrow(quine) res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames = list(NULL, c("Lambda", "LL"))) # scaling contant C <- exp(mean(log(quine$Days+1))) for(i in seq_along(lambda)) { SS <- deviance(glm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) LL <- (- (N/2) * log(SS/((C^lambda[i])^2))) res[i, ] <- c(lambda[i], LL) } # box cox boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add our points on top to verify match points(res[, 1], res[,2], pch = 16) ## On Mon, Jul 7, 2014 at 11:57 PM, Ravi Varadhan wrote: > Dear Josh, > Thank you very much. I knew that the scaling had to be adjusted, but was not > sure on how to do this. > > Can you please show me how to do this scaling with `glm'? In other words, how > would I scale the deviance from glm? > > Thanks, > Ravi > > -Original Message- > From: Joshua Wiley [mailto:jwiley.ps...@gmail.com] > Sent: Sunday, July 06, 2014 11:34 PM > To: Ravi Varadhan > Cc: r-help@r-project.org > Subject: Re: [R] Box-cox transformation > > Hi Ravi, > > Deviance is the SS in this case, but you need a normalizing constant adjusted > by the lambda to put them on the same scale. I modified your example below > to simplify slightly and use the normalization (see the LL line). > > Cheers, > > Josh > > ## > > require(MASS) > > myp <- function(y, lambda) (y^lambda-1)/lambda > > > lambda <- seq(-0.05, 0.45, len = 20) > N <- nrow(quine) > res <- matrix(numeric(0), nrow = length(lambda), 2, dimnames = > list(NULL, c("Lambda", "LL"))) > > # scaling contant > C <- exp(mean(log(quine$Days+1))) > > for(i in seq_along(lambda)) { > r <- resid(lm(myp(Days + 1, lambda[i]) ~ Eth*Sex*Age*Lrn, data = quine)) > LL <- (- (N/2) * log(sum((r/(C^lambda[i]))^2))) > res[i, ] <- c(lambda[i], LL) > } > > # box cox > boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = lambda) # add > our points on top to verify match points(res[, 1], res[,2], pch = 16) > > ## > > > > On Mon, Jul 7, 2014 at 11:33 AM, Ravi Varadhan wrote: >> Hi, >> >> I am trying to do Box-Cox transformation, but I am not sure how to do it >> correctly. Here is an example showing what I am trying: >> >> >> >> # example from MASS >> >> require(MASS) >> boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, >>lambda = seq(-0.05, 0.45, len = 20)) >> >> # Here is My attempt at getting the profile likelihood for the >> Box-Cox parameter lam <- seq(-0.05, 0.45, len = 20) dev <- rep(NA, >> length=20) >> >> for (i in 1:20) { >> a <- lam[i] >> ans <- glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data >> = >> quine) dev[i] <- ans$deviance } >> >> plot(lam, dev, type="b", xlab="lambda", ylab="deviance") >> >> I am trying to create the profile likelihood for the Box-Cox parameter, but >> obviously I am not getting it right. I am not sure that ans$deviance is the >> right thing to do. >> >> I would appreciate any guidance. >> >> Thanks & Best, >> Ravi >> >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Joshua F. Wiley > Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior > Analyst, Elkhart Group Ltd. > http://elkhartgroup.com > Office: 260.673.5518 -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Box-cox transformation
Hi, I am trying to do Box-Cox transformation, but I am not sure how to do it correctly. Here is an example showing what I am trying: # example from MASS require(MASS) boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = seq(-0.05, 0.45, len = 20)) # Here is My attempt at getting the profile likelihood for the Box-Cox parameter lam <- seq(-0.05, 0.45, len = 20) dev <- rep(NA, length=20) for (i in 1:20) { a <- lam[i] ans <- glm(((Days+1)^a-1)/a ~ Eth*Sex*Age*Lrn, family=gaussian, data = quine) dev[i] <- ans$deviance } plot(lam, dev, type="b", xlab="lambda", ylab="deviance") I am trying to create the profile likelihood for the Box-Cox parameter, but obviously I am not getting it right. I am not sure that ans$deviance is the right thing to do. I would appreciate any guidance. Thanks & Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A combinatorial assignment problem
Thank you, Dan and Bert. Bert - Your approach provides a solution. However, it has the undesired property of referees lumping together (I apologize that I did not state this as a condition). In other words, it does not "mix" the referees in some random fashion. Dan - your approach attempts to have the desired properties, but is not guaranteed to work. Here is a counterexample: > set.seed(1234) > a <- assignment(40,15,3) > table(a) a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 7 12 7 4 10 8 6 8 13 7 7 11 3 7 Notice that the difference between maximum and minimum candidates for referees is 13 - 3 = 10. Of course, I have to increase the # iters to get a better solution, but for large K and R this may not converge at all. Best regards, Ravi From: Ravi Varadhan Sent: Wednesday, April 30, 2014 1:49 PM To: r-help@r-project.org Subject: A combinatorial assignment problem Hi, I have this problem: K candidates apply for a job. There are R referees available to review their resumes and provide feedback. Suppose that we would like M referees to review each candidate (M < R). How would I assign candidates to referees (or, conversely, referees to candidates)? There are two important cases: (a) K > (R choose M) and (b) K < (R chooses M). Case (a) actually reduces to case (b), so we only have to consider case (b). Without any other constraints, the problem is quite easy to solve. Here is an example that shows this. require(gtools) set.seed(12345) K <- 10 # number of candidates R <- 7# number of referees M <- 3 # overlap, number of referees reviewing each candidate allcombs <- combinations(R, M, set=TRUE, repeats.allowed=FALSE) assignment <- allcombs[sample(1:nrow(allcombs), size=K, replace=FALSE), ] assignment > assignment [,1] [,2] [,3] [1,]345 [2,]357 [3,]567 [4,]356 [5,]167 [6,]127 [7,]145 [8,]367 [9,]245 [10,]457 > Here each row stands for a candidate and the column stands for the referees who review that candidate. Of course, there are some constraints that make the assignment challenging. We would like to have an even distribution of the number of candidates reviewed by each referee. For example, the above code produces an assignment where referee #2 gets only 2 candidates and referee #5 gets 7 candidates. We would like to avoid such uneven assignments. > table(assignment) assignment 1 2 3 4 5 6 7 3 2 4 4 7 4 6 > Note that in this example there are 35 possible triplets of referees and 10 candidates. Therefore, a perfectly even assignment is not possible. I tried some obvious, greedy search methods but they are not guaranteed to work. Any hints or suggestions would be greatly appreciated. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A combinatorial assignment problem
Thanks, Bert. I have written this simple code, which is crude, but seems to do a decent job. It works perfectly when M is a factor of R. Otherwise, it gives decent balance (of course, balance is not guaranteed). I guess it is possible to take the results, that are somewhat unbalanced and then reshuffle it semi-randomly to get better balance. Any improvements are appreciated! assign <- function(K, R, M, iseed=1234) { assignment <- matrix(NA, K, M) N <- R %/% M Nrem <- R %% M iseq <- seq(1,K,N) for (i in iseq){ size <- ifelse(K-i >= N, R-Nrem, M*(K-i+1)) sel <- sample(1:R, size=size, replace=FALSE) end <- min((i+N-1),K) assignment[i:end, ] <- sel } assignment } sol <- assign(40,16,3) table(sol) Thanks, Ravi -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Thursday, May 01, 2014 10:46 AM To: Ravi Varadhan Cc: r-help@r-project.org; djnordl...@frontier.com Subject: Re: A combinatorial assignment problem Ravi: You cannot simultaneously have balance and guarantee random mixing. That is, you would need to specify precisely what you mean by balance and random mixing in this context, as these terms are now subjective and undefined. You could, of course, randomize the initial assignment of referees to positions and note also that some mixing of referee groups would occur if m does not divide r evenly. More explicitly, note that a very fast way to generate the groups I described is: rmk <- function(nrefs,size,ncands){ n <- ncands * size matrix(rep(seq_len(nrefs), n %/% nrefs +1)[seq_len(n)],nrow= ncands,byrow=TRUE) } ## corner case checks and adjustments need to be made to this, of course. > rmk(7,3,10) [,1] [,2] [,3] [1,]123 [2,]456 [3,]712 [4,]345 [5,]671 [6,]234 [7,]567 [8,]123 [9,]456 [10,]712 You could then modify this by randomly reordering the referees every time a complete cycle of the groupings occurred, i.e. after each nrefs/gcd(nrefs,size) candidates = rows. This would give variable groups and even assignment. This algorithm could be further fiddled with by choosing nrefs and size to assure they are relatively prime and then adding and removing further refs randomly during the cycling. etc. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Thu, May 1, 2014 at 6:09 AM, Ravi Varadhan wrote: > Thank you, Dan and Bert. > > > > Bert – Your approach provides a solution. However, it has the > undesired property of referees lumping together (I apologize that I > did not state this as a condition). In other words, it does not "mix" > the referees in some random fashion. > > > > Dan – your approach attempts to have the desired properties, but is > not guaranteed to work. Here is a counterexample: > > > >> set.seed(1234) > >> a <- assignment(40,15,3) > >> table(a) > > a > > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > 10 7 12 7 4 10 8 6 8 13 7 7 11 3 7 > > > > Notice that the difference between maximum and minimum candidates for > referees is 13 – 3 = 10. Of course, I have to increase the # iters to > get a better solution, but for large K and R this may not converge at all. > > > > Best regards, > > Ravi > > > > From: Ravi Varadhan > Sent: Wednesday, April 30, 2014 1:49 PM > To: r-help@r-project.org > Subject: A combinatorial assignment problem > > > > Hi, > > > > I have this problem: K candidates apply for a job. There are R > referees available to review their resumes and provide feedback. > Suppose that we would like M referees to review each candidate (M < > R). How would I assign candidates to referees (or, conversely, > referees to candidates)? There are two important cases: (a) K > (R choose > M) and (b) K < (R chooses M). > > > > Case (a) actually reduces to case (b), so we only have to consider case (b). > Without any other constraints, the problem is quite easy to solve. > Here is an example that shows this. > > > > require(gtools) > > set.seed(12345) > > K <- 10 # number of candidates > > R <- 7# number of referees > > M <- 3 # overlap, number of referees reviewing each candidate > > > > allcombs <- combinations(R, M, set=TRUE, repeats.allowed=FALSE) > > assignment <- allcombs[sample(1:nrow(allcombs), size=K, > replace=FALSE), ] > > assignment > >> assignment > > [,1] [,2] [,3] > > [1,
[R] A combinatorial assignment problem
Hi, I have this problem: K candidates apply for a job. There are R referees available to review their resumes and provide feedback. Suppose that we would like M referees to review each candidate (M < R). How would I assign candidates to referees (or, conversely, referees to candidates)? There are two important cases: (a) K > (R choose M) and (b) K < (R chooses M). Case (a) actually reduces to case (b), so we only have to consider case (b). Without any other constraints, the problem is quite easy to solve. Here is an example that shows this. require(gtools) set.seed(12345) K <- 10 # number of candidates R <- 7# number of referees M <- 3 # overlap, number of referees reviewing each candidate allcombs <- combinations(R, M, set=TRUE, repeats.allowed=FALSE) assignment <- allcombs[sample(1:nrow(allcombs), size=K, replace=FALSE), ] assignment > assignment [,1] [,2] [,3] [1,]345 [2,]357 [3,]567 [4,]356 [5,]167 [6,]127 [7,]145 [8,]367 [9,]245 [10,]457 > Here each row stands for a candidate and the column stands for the referees who review that candidate. Of course, there are some constraints that make the assignment challenging. We would like to have an even distribution of the number of candidates reviewed by each referee. For example, the above code produces an assignment where referee #2 gets only 2 candidates and referee #5 gets 7 candidates. We would like to avoid such uneven assignments. > table(assignment) assignment 1 2 3 4 5 6 7 3 2 4 4 7 4 6 > Note that in this example there are 35 possible triplets of referees and 10 candidates. Therefore, a perfectly even assignment is not possible. I tried some obvious, greedy search methods but they are not guaranteed to work. Any hints or suggestions would be greatly appreciated. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A couple of issues with lattice::bwplot
Hi All, I am using lattice::bwplot to plot the results of a subgroup analysis for different simulation scenarios. I have 4 subgroups, 5 methods of estimating treatment effect in subgroups, and 4 scenarios. I am plotting the subgroup effect and its upper and lower CI. I have 2 issues that I am unable to resolve: 1. How can I control the order in which the panels are plotted? For example, I would like to bottom and top rows to be switched, and for 2nd and 3rd rows to be switched. 2. How can I plot a point within each panel at a fixed coordinate? My code is as follows: library(lattice) trellis.par.set("box.rectangle", list(lty=1, lwd=2)) trellis.par.set("box.umbrella", list(lty=1, lwd=2)) n.param <- 16 est.method <- c("Bayes-DS", "EB", "MLE-DS", "Naive", "Stand") est.type <- c("Point", "LCL", "UCL") est.names <- c("Int.", paste0("X", 2:n.param)) est.names <- c(outer(c("X0=0", "X0=1","X1=0", "X1=1"), paste0(", scenario", 1:4), function(x,y) paste0(x, y))) n.methods <- length(est.method) n.types <- length(est.type) estimates <- data.frame(Tx.effect=sort(rnorm(n.param*n.methods*n.types, mean=0.5, sd=0.2)), method=rep(est.method, each=n.param*n.types), name=rep(est.names, times=n.types*n.methods), type=rep(est.type, times=n.param*n.methods)) truth <- runif(16) # I would like to plot these points, one per each panel bwplot(method ~ Tx.effect|name, data=estimates, box.width=0, coef=0, pch=22) I would greatly appreciate any help. Thank you, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to Install a package from source in Windows
Hi, I am using following R version: > version _ platform i386-w64-mingw32 arch i386 os mingw32 system i386, mingw32 status major 3 minor 0.1 year 2013 month 05 day16 svn rev62743 language R version.string R version 3.0.1 (2013-05-16) nickname Good Sport > I was able to build the source of a package without any errors or warnings. However, when I try to install the package, I get the following error messages. Can someone point me to what I am doing wrong? Thanks in advance, Ravi > install.packages("H:/Documents/computations/BB_2014.01-1.tar.gz", repos=NULL, > type="source") Installing package into '\\homer.win.ad.jhu.edu/users$/rvaradh1/Documents/R/win-library/3.0' (as 'lib' is unspecified) '\\homer.win.ad.jhu.edu\users$\rvaradh1\Documents' CMD.EXE was started with the above path as the current directory. UNC paths are not supported. Defaulting to Windows directory. * installing *source* package 'BB' ... ** R ** demo ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded *** arch - i386 Warning in library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) : there is no package called 'BB' Error: loading failed Execution halted *** arch - x64 Warning in library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) : there is no package called 'BB' Error: loading failed Execution halted ERROR: loading failed for 'i386', 'x64' * removing '\\homer.win.ad.jhu.edu/users$/rvaradh1/Documents/R/win-library/3.0/BB' * restoring previous '\\homer.win.ad.jhu.edu/users$/rvaradh1/Documents/R/win-library/3.0/BB' Warning messages: 1: running command '"C:/PROGRA~1/R/R-30~1.1/bin/i386/R" CMD INSTALL -l "\\homer.win.ad.jhu.edu\users$\rvaradh1\Documents\R\win-library\3.0" "H:/Documents/computations/BB_2014.01-1.tar.gz"' had status 1 2: In install.packages("H:/Documents/computations/BB_2014.01-1.tar.gz", : installation of package 'H:/Documents/computations/BB_2014.01-1.tar.gz' had non-zero exit status > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convergence=0 in optim and nlminb is real?
The optimization algorithms did converge to a limit point. But, not to a stationary point, i.e. a point in parameter space where the first and second order KKT conditions are satisfied. If you check the gradient at the solution, you will see that it is quite large in magnitude relative to 0. So, why did the algorithms declare convergence? Convergence is based on absolute change in function value and/or relative change in parameter values between consecutive iterations. This does not ensure that the KKT conditions are satisfied. Now, to the real issue: your problem is ill-posed. As you can tell from the eigenvalues of the hessian, they vary over 9 orders of magnitude. This may indicate a problem with the data in that the log-likelihood is over-parametrized relative to the information in the data set. Get a better data set or formulate a simpler model, and the problem will disappear. Best, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [optim/bbmle] function returns NA
Carlos, There are likely several problems with your likelihood. You should check it carefully first before you do any optimization. It seems to me that you have box constraints on the parameters. They way you are enforcing them is not correct. I would prefer to use an optimization algorithm that can handle box constraints rather than use artificial mechanisms such as what you are trying to do: if (r<=0 | alpha<=0 | s<=0 | beta<=0) return (NaN) Even here, a better approach would be: if (r<=0 | alpha<=0 | s<=0 | beta<=0) return (-.Machine$double.xmax) I also notice that you do not use `tx' in your `g' function. There are likely a number of other issues as well. You may try the nmkb() function in the "dfoptim" package. It can handle box constraints and typically tends to perform a bit better than optim's Nelder-Mead for unconstrained problems. Hope this is helpful, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GARCH Optimization Problems
Hi Tony, Did you try the `auglag' algorithm in "alabma" package? Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] While using R CMD check: LaTex error: File `inconsolata.sty' not found
Hi, While using R CMD check I get the following Latex error message which occurs when creating PDF version of manual: LaTex error: File `inconsolata.sty' not found I am using Windows 7 (64-bit) and R 3.0.1. I have MikTex 2.9. I see that the incosolata.sty is present under \doc\fonts folder. How can I eliminate this problem? Thanks, Ravi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorized code for generating the Kac (Clement) matrix
Thank you, Berend and Enrico, for looking into this. I did not think of Enrico's clever use of cbind() to form the subsetting indices. Best, Ravi From: Berend Hasselman [b...@xs4all.nl] Sent: Friday, April 26, 2013 10:08 AM To: Enrico Schumann Cc: Ravi Varadhan; 'r-help@r-project.org' Subject: Re: [R] Vectorized code for generating the Kac (Clement) matrix On 26-04-2013, at 14:42, Enrico Schumann wrote: > On Thu, 25 Apr 2013, Ravi Varadhan writes: > >> Hi, I am generating large Kac matrices (also known as Clement matrix). >> This a tridiagonal matrix. I was wondering whether there is a >> vectorized solution that avoids the `for' loops to the following code: >> >> n <- 1000 >> >> Kacmat <- matrix(0, n+1, n+1) >> >> for (i in 1:n) Kacmat[i, i+1] <- n - i + 1 >> >> for (i in 2:(n+1)) Kacmat[i, i-1] <- i-1 >> >> The above code is fast, but I am curious about vectorized ways to do this. >> >> Thanks in advance. >> Best, >> Ravi >> > > > This may be a bit faster; but as Berend and you said, the original > function seems already fast. > > n <- 5000 > > f1 <- function(n) { >Kacmat <- matrix(0, n+1, n+1) >for (i in 1:n) Kacmat[i, i+1] <- n - i + 1 >for (i in 2:(n+1)) Kacmat[i, i-1] <- i-1 >Kacmat > } > f3 <- function(n) { >n1 <- n + 1L >res <- numeric(n1 * n1) >dim(res) <- c(n1, n1) >bw <- n:1L ## bw = backward, fw = forward >fw <- seq_len(n) >res[cbind(fw, fw + 1L)] <- bw >res[cbind(fw + 1L, fw)] <- fw >res > } > > system.time(K1 <- f1(n)) > system.time(K3 <- f3(n)) > identical(K3, K1) > > ##user system elapsed > ## 0.132 0.028 0.161 > ## > ##user system elapsed > ## 0.024 0.048 0.071 > ## Using some of your code in my function I was able to speed up my function f2. Complete code: f1 <- function(n) { #Ravi Kacmat <- matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] <- n - i + 1 for (i in 1:n) Kacmat[i+1, i] <- i Kacmat } f2 <- function(n) { # Berend 1 modified to use 1L Kacmat <- matrix(0, n+1, n+1) Kacmat[row(Kacmat)==col(Kacmat)-1L] <- n:1L Kacmat[row(Kacmat)==col(Kacmat)+1L] <- 1L:n Kacmat } f3 <- function(n) { # Enrico n1 <- n + 1L res <- numeric(n1 * n1) dim(res) <- c(n1, n1) bw <- n:1L ## bw = backward, fw = forward fw <- seq_len(n) res[cbind(fw, fw + 1L)] <- bw res[cbind(fw + 1L, fw)] <- fw res } f4 <- function(n) {# Berend 2 using which with arr.ind=TRUE Kacmat <- matrix(0, n+1, n+1) k1 <- which(row(Kacmat)==col(Kacmat)-1L, arr.ind=TRUE) k2 <- which(row(Kacmat)==col(Kacmat)+1L, arr.ind=TRUE) Kacmat[k1] <- n:1L Kacmat[k2] <- 1L:n Kacmat } library(compiler) f1.c <- cmpfun(f1) f2.c <- cmpfun(f2) f3.c <- cmpfun(f3) f4.c <- cmpfun(f4) f1(n) f2(n) n <- 5000 system.time(K1 <- f1(n)) system.time(K2 <- f2(n)) system.time(K3 <- f3(n)) system.time(K4 <- f4(n)) system.time(K1c <- f1.c(n)) system.time(K2c <- f2.c(n)) system.time(K3c <- f3.c(n)) system.time(K4c <- f4.c(n)) identical(K2,K1) identical(K3,K1) identical(K4,K1) identical(K1c,K1) identical(K2c,K2) identical(K3c,K3) identical(K4c,K4) Result: # > system.time(K1 <- f1(n)) #user system elapsed # 0.387 0.120 0.511 # > system.time(K2 <- f2(n)) #user system elapsed # 3.541 0.702 4.250 # > system.time(K3 <- f3(n)) #user system elapsed # 0.108 0.089 0.199 # > system.time(K4 <- f4(n)) #user system elapsed # 1.975 0.355 2.336 # > # > system.time(K1c <- f1.c(n)) #user system elapsed # 0.323 0.120 0.445 # > system.time(K2c <- f2.c(n)) #user system elapsed # 3.374 0.422 3.807 # > system.time(K3c <- f3.c(n)) #user system elapsed # 0.107 0.098 0.205 # > system.time(K4c <- f4.c(n)) #user system elapsed # 1.816 0.384 2.203 # > identical(K2,K1) # [1] TRUE # > identical(K3,K1) # [1] TRUE # > identical(K4,K1) # [1] TRUE # > identical(K1c,K1) # [1] TRUE # > identical(K2c,K2) # [1] TRUE # > identical(K3c,K3) # [1] TRUE # > identical(K4c,K4) # [1] TRUE So Ravi's original and Enrico's versions are the quickest. Using which with arr.ind made my version run a lot quicker. All in all an interesting exercise. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vectorized code for generating the Kac (Clement) matrix
Hi, I am generating large Kac matrices (also known as Clement matrix). This a tridiagonal matrix. I was wondering whether there is a vectorized solution that avoids the `for' loops to the following code: n <- 1000 Kacmat <- matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] <- n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] <- i-1 The above code is fast, but I am curious about vectorized ways to do this. Thanks in advance. Best, Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Crrstep help
Kathy, You need to make sure that you have installed the latest version of `cmprsk' package (version 2.2-4). crrstep will not work Otherwise. Ravi From: Ravi Varadhan Sent: Thursday, March 28, 2013 5:25 PM To: 'kathyhan...@gmail.com' Cc: r-help@r-project.org Subject: Re: [R] Crrstep help Hi Kathy, You should first contact the package maintainer (which is me!) before posting to r-help. Which version of the package are you using? Can you send me a minimally reproducible code? Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Crrstep help
Hi Kathy, You should first contact the package maintainer (which is me!) before posting to r-help. Which version of the package are you using? Can you send me a minimally reproducible code? Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Optimization in R (alabama)
Berend, I had pointed out this solution to Axel already, but he aparently wants to solve more general problems like this, so he wanted a solution from constrained optimimzers. But, if the more general problems have the similarly discrete set of candidate parameter values, using constrained optimizers like alabama would still NOT be good approaches. It would be easiest to just go through the discrete values. Ravi From: Berend Hasselman [b...@xs4all.nl] Sent: Monday, February 11, 2013 12:45 AM To: Axel Urbiz Cc: R-help@r-project.org; Ravi Varadhan Subject: Re: [R] Constrained Optimization in R (alabama) On 10-02-2013, at 21:16, Axel Urbiz wrote: > Dear List, > > I'm trying to solve this simple optimization problem in R. The parameters > are the exponents to the matrix mm. The constraints specify that each row > of the parameter matrix should sum to 1 and their product to 0. I don't > understand why the constraints are not satisfied at the solution. I must be > misinterpreting how to specify the constrains somehow. > > > library(alabama) > > ff <- function (x) { > mm <- matrix(c(10, 25, 5, 10), 2, 2) > matx <- matrix(x, 2, 2) > -sum(apply(mm ^ matx, 1, prod)) > } > > > ### constraints > > heq <- function(x) { > h <- rep(NA, 1) > h[1] <- x[1] + x[3] -1 > h[2] <- x[2] + x[4] -1 > h[3] <- x[1] * x[3] > h[4] <- x[2] * x[4] > h > } > > res <- constrOptim.nl(par = c(1, 1, 1, 1), fn = ff, > heq = heq) > res$convergence #why NULL? > matrix(round(res$par, 2), 2, 2) #why constraints are not satisfied? There is no need to use an optimization function in this case. You can easily find the optimum by hand. >From the constraints : x[3] equals (1-x[1]) x[4] equals (1-x[2]) Therefore x[1] * (1-x[1]) equals 0 x[2] * (1-x[2]) equals 0 So you only need to calculate the function value for 4 vectors: x1 <- c(1,1,0,0) x2 <- c(0,0,1,1) x3 <- c(1,0,0,1) x4 <- c(0,1,1,0) Modifying your ff function in the way Ravi suggested: ff <- function (x) { mm <- matrix(c(10, 25, 5, 10), 2, 2) matx <- matrix(x, 2, 2) sum(apply(mm ^ matx, 1, prod)) } and running ff with x{1,2,3,4} as argument, one gets > ff(x1) [1] 35 > ff(x2) [1] 15 > ff(x3) [1] 20 > ff(x4) [1] 30 So ff(x2) achieves the minimum, which corresponds to Ravi's result with auglag. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Optimization in R (alabama)
Axel, Your objective function has the wrong sign. The maximum is infinity, so it does not make sense to maximize. YOu should be minimizing the product. So, remove the negative sign and it works as you had expected. ff <- function (x) { mm <- matrix(c(10, 25, 5, 10), 2, 2) matx <- matrix(x, 2, 2) # - sum(apply(mm ^ matx, 1, prod)) # this is the problem sum(apply(mm ^ matx, 1, prod)) } ### constraints heq <- function(x) { h <- rep(NA, 1) h[1] <- x[1] + x[3] -1 h[2] <- x[2] + x[4] -1 h[3] <- x[1] * x[3] h[4] <- x[2] * x[4] h } res <- auglag(par = rep(1,4), fn = ff, heq = heq) Ravi From: Axel Urbiz [axel.ur...@gmail.com] Sent: Sunday, February 10, 2013 3:16 PM To: R-help@r-project.org; Ravi Varadhan Subject: Constrained Optimization in R (alabama) Dear List, I'm trying to solve this simple optimization problem in R. The parameters are the exponents to the matrix mm. The constraints specify that each row of the parameter matrix should sum to 1 and their product to 0. I don't understand why the constraints are not satisfied at the solution. I must be misinterpreting how to specify the constrains somehow. library(alabama) ff <- function (x) { mm <- matrix(c(10, 25, 5, 10), 2, 2) matx <- matrix(x, 2, 2) -sum(apply(mm ^ matx, 1, prod)) } ### constraints heq <- function(x) { h <- rep(NA, 1) h[1] <- x[1] + x[3] -1 h[2] <- x[2] + x[4] -1 h[3] <- x[1] * x[3] h[4] <- x[2] * x[4] h } res <- constrOptim.nl(par = c(1, 1, 1, 1), fn = ff, heq = heq) res$convergence #why NULL? matrix(round(res$par, 2), 2, 2) #why constraints are not satisfied? Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained Optimization in R (alabama)
Your constraints are non-sensical. The only way for these constraints to be satisfied is for two of the parameters to be 0 and the other two to be 1. Please spend time to formulate your problem correctly, before imposing on others' time. Ravi From: Axel Urbiz [axel.ur...@gmail.com] Sent: Sunday, February 10, 2013 3:16 PM To: R-help@r-project.org; Ravi Varadhan Subject: Constrained Optimization in R (alabama) Dear List, I'm trying to solve this simple optimization problem in R. The parameters are the exponents to the matrix mm. The constraints specify that each row of the parameter matrix should sum to 1 and their product to 0. I don't understand why the constraints are not satisfied at the solution. I must be misinterpreting how to specify the constrains somehow. library(alabama) ff <- function (x) { mm <- matrix(c(10, 25, 5, 10), 2, 2) matx <- matrix(x, 2, 2) -sum(apply(mm ^ matx, 1, prod)) } ### constraints heq <- function(x) { h <- rep(NA, 1) h[1] <- x[1] + x[3] -1 h[2] <- x[2] + x[4] -1 h[3] <- x[1] * x[3] h[4] <- x[2] * x[4] h } res <- constrOptim.nl(par = c(1, 1, 1, 1), fn = ff, heq = heq) res$convergence #why NULL? matrix(round(res$par, 2), 2, 2) #why constraints are not satisfied? Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non linear programming: choose R that minimizes corr(y,x^R)
Hi John, Do you want to minimize correlation or |correlation|? Here is a simple code, with example, that shows how to optimize the magnitude (that is corr^2) of correlation between y and x^p, where p is a positive real number. You can modify it to suit your needs. corxy.opt <- function(x, y, interval=c(0,4), maximum=TRUE){ obj <- function(p, x, y) cor(y, x^p)^2 ans <- optimize(f=obj, interval=interval, maximum=maximum, x=x, y=y) ans } set.seed(321) x <- runif(100) y <- rnorm(100, mean=1-2*x, sd=1) corxy.opt(x, y) # Why minimizing does not make sense obj2 <- function(p, x, y) sapply(p, function(p) cor(y, x^p)^2) plot(seq(0,5,length=1000), obj2(seq(0,5,length=1000),x,y), type="l") Best, Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] starting values in glm(..., family = binomial(link =log))
I did not get any warnings when I ran your data/model example. From: Fischer, Felix [mailto:felix.fisc...@charite.de] Sent: Wednesday, January 30, 2013 11:19 AM To: Ravi Varadhan Cc: r-help@r-project.org Subject: AW: [R] starting values in glm(..., family = binomial(link =log)) Thanks for your replies! It seems, that I can fit my model now, when I can provide the "right" starting values; however there remain warnings, such as: 1: In log(ifelse(y == 1, 1, (1 - y)/(1 - mu))) : NaNs wurden erzeugt 2: step size truncated due to divergence 3: step size truncated: out of bounds ... That makes me feel uncomfortable and I wonder whether I can "trust" the fitted model. Why is this kind of regression so picky about starting values compared to logistic regression? And is there a way to explain the difference between binomial - quasibinomial to a simple mind like mine? Best, Felix Von: Ravi Varadhan [mailto:ravi.varad...@jhu.edu] Gesendet: Mittwoch, 30. Januar 2013 17:02 An: Fischer, Felix Cc: r-help@r-project.org<mailto:r-help@r-project.org> Betreff: [R] starting values in glm(..., family = binomial(link =log)) Try this: Age_log_model = glm(Arthrose ~ Alter, data=x, start=c(-1, 0), family=quasibinomial(link = log)) Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] starting values in glm(..., family = binomial(link =log))
Try this: Age_log_model = glm(Arthrose ~ Alter, data=x, start=c(-1, 0), family=quasibinomial(link = log)) Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model selection with spg and AIC (or, convert list to fitted model object)
Adam, Getting the variance of MLE estimator when the true parameter is on the boundary is a very difficult problem. It is known that the standard bootstrap does not work. There are some sub-sampling approaches (Springer book: Politis, Romano, Wolff), but I am not an expert on this. However, an important question is how do we know that the truth is supposedly on the boundary. If you can assume that the truth is in the interior, then you can use standard approaches: observed hessian or bootstrap to get standard errors. Best, Ravi From: Adam Zeilinger [mailto:ad...@ucr.edu] Sent: Sunday, December 02, 2012 5:04 PM To: Ravi Varadhan Cc: Adam Zeilinger (zeil0...@umn.edu); r-help@r-project.org Subject: Re: [R] model selection with spg and AIC (or, convert list to fitted model object) Dear Ravi, Thank you so much for the help. I switched to using the optimx function but I continue to use the spg method (for the most part) because I found that only spg consistently converges give different datasets. I also decided to use AIC rather that a likelihood ratio test. I have a new question. I would like to construct 95% confidence intervals for the parameter estimates from the best model. From a previous R Help thread, you said that it was a bad idea to use the Hessian matrix computed from optim/optimx or hessian() [numDeriv package] when the optimization is constrained and parameter estimates are on the boundary because the MLE is likely at a local minimum: http://tolstoy.newcastle.edu.au/R/e15/help/11/09/6673.html In the same thread, you suggest using the Hessian matrix from augmented Lagrangian optimization with auglag() [alabama package] (with some caveats). I would like to construct 95% CI with auglag, but I don't understand how to write the inequality constraints (hin) function. Could you please help me write the hin function? Below is R code for my MLE problem, using data that results in parameter estimates on the boundary, and my unsuccessful attempt at auglag() optimization. Note: I have gradient functions for NLL1 and NLL2 but they're very large and don't seem to improve optimization, so they are not included here. I can supply them if needed for the auglag() function. Also, I am running R 2.15.2 on a Windows 7 64-bit machine. ## library(optimx) library(alabama) # define multinomial distribution dmnom2 <- function(x,prob,log=FALSE) { r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1)) if (log) r else exp(r) } # data frame y y <- structure(list(t = c(0.167, 0.5, 1, 12, 18, 24, 36), n1 = c(1L, 1L, 1L, 8L, 9L, 10L, 12L), n2 = c(0L, 1L, 2L, 6L, 5L, 3L, 2L), n3 = c(13L, 12L, 11L, 0L, 0L, 1L, 0L)), .Names = c("t", "n1", "n2", "n3"), class = "data.frame", row.names = 36:42) # Negative log-likelihood functions NLL1 <- function(par, y) { p1 <- par[1] p2 <- p1 mu1 <- par[2] mu2 <- mu1 t <- y$t n1 <- y$n1 n2 <- y$n2 n3 <- y$n3 P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) + mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2*t)*mu2* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2 P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 + mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2*t)*mu1* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2 P3 <- 1 - P1 - P2 p.all <- c(P1, P2, P3) if(all(p.all > 0 & p.all < 1)) -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE)) else 1e07 } NLL2 <- function(par, y) { p1 <- par[1] p2 <- par[2] mu1 <- par[3] mu2 <- par[4] t <- y$t n1 <- y$n1 n2 <- y$n2 n
Re: [R] Integration in R
Send us a reproducible R code that shows what you actually tried. Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorized uni-root?
Hi Ivo, The only problem is that uniroot() actually uses Brent's algorithm, and is based on the C code from netlib (there is also Fortran code available - zeroin.f). Brent's algorithm is more sophisticated than a simple bisection approach that you have vectorized. It combines bisection and secant methods along with some quadratic interpolation. The idea behind this hybrid approach (which by the way is a fundamental theme in all of numerical analysis) is to get faster convergence while not sacrificing the global convergence of bisection. So, the existing uniroot() will not be deprecated unless you can vectorize Brent's hybrid root-finding approach! Best, Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim & .C / Crashing on run
You might also want to try the Nelder-Mead algorithm, nmk(), in the "dfoptim" package. It is a better algorithm than the Nelder-Mead in optim. It is all R code, so you might be able to modify it to fit your needs. Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Struggeling with nlminb...
Try optimx() in the "optimx" package with the control option `all.methods=TRUE'. Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Optimization Constraints
You have nonlinear constraints on the parameters. Take a look at constrained optimization algorithms in packages "alabama" or "Rsolnp" that can handle non-linearly constrained optimization problems. Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fit a "threshold" function with nls
Veronique, Can you write down the likelihood function in R and send it to me (this is very easy to do, but I don't want to do your work)? Also send me the code for simulating the data. I will show you how to fit such models using optimization tools. Best, Ravi From: Vito Muggeo [vito.mug...@unipa.it] Sent: Tuesday, October 16, 2012 9:55 AM To: Bert Gunter Cc: Véronique Boucher Lalonde; r-help@r-project.org; Ravi Varadhan Subject: Re: [R] fit a "threshold" function with nls Véronique, in addition to Bert's comments, I would like to bring to your attention that there are several packages that perform threshold/breakpoint/changepoint estimation in R, including cumSeg, segmented, strucchange, and bcp for a Bayesian approach Moreover some packages, such as cghFLasso, perfom smoothing with a L1 penalty that can yield a step-function fit. I hope this helps you, vito Il 15/10/2012 22.21, Bert Gunter ha scritto: > Véronique: > > I've cc'ed this to a true expert (Ravi Varadhan) who is one of those > who can give you a definitive response, but I **believe** the problem > is that threshhold type function fits have objective functions whose > derivatives are discontinuous,and hence gradient -based methods can > run into the sorts of problems that you see. > > **If** this is so, then you might do better using an explicit > non-gradient optimizer = rss minimizer via one of the optim() suite of > functions (maybe even the default Nelder-Mead); but this is definitely > where the counsel of an expert would be valuable. Also check out the > CRAN Optimization Task View for advice on optimization options. > > Cheers, > Bert > > On Mon, Oct 15, 2012 at 9:43 AM, Véronique Boucher Lalonde > wrote: >> I am trying to model a dependent variable as a threshold function of >> my independent variable. >> What I mean is that I want to fit different intercepts to y following 2 >> breakpoints, but with fixed slopes. >> I am trying to do this with using ifelse statements in the nls function. >> Perhaps, this is not an appropriate approach. >> >> I have created a very simple example to illustrate what I am trying to do. >> >> #Creating my independent variable >> x <- seq(0, 1000) >> #Creating my dependent variable, where all values below threshold #1 are 0, >> all values above threshold #2 are 0 and all values within the two >> thresholds are 1 >> y <- ifelse(x < 300, 0, ifelse(x>700, 0, 1)) >> #Adding noise to my dependent variable >> y <- y + rnorm(length(y), 0, 0.01) >> #I am having trouble clearly explaining the model I want to fit but perhaps >> the plot is self explanatory >> plot(x, y) >> #Now I am trying to adjust a nonlinear model to fit the two breakpoints, >> with known slopes between the breakpoints (here, respectively 0, 1 and 0) >> threshold.model <- nls(y~ifelse(xb2, 0, 1)), >> start=list(b1=300, b2=700), trace=T) >> >> I have played around with this function quite a bit and systematically get >> an error saying: singular gradient matrix at initial parameter estimates >> I admit that I don't fully understand what a singular gradient matrix >> implies. >> But it seems to me that both my model and starting values are sensible >> given the data, and that the break points are in fact estimable. >> Could someone point to what I am doing wrong? >> >> More generally, I am interested in knowing >> (1) whether I can use such ifelse statements in the function nls >> (1) whether I can even use nls for this model >> (3) whether I can model this with a function that would allow me to assume >> that the errors are binomial, rather than normally distributed >> (ultimately I want to use such a model on binomial data) >> >> I am using R version 2.15.1 on 64-bit Windows 7 >> >> Any guidance would be greatly appreciated. >> Veronique >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > -- Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Università di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240 fax: 091 485726 http://dssm.unipa.it/vmuggeo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] model selection with spg and AIC (or, convert list to fitted model object)
Adam, See the attached R code that solves your problem and beyond. One important issue is that you are enforcing constraints only indirectly. You need to make sure that P1, P2, and P3 (which are functions of original parameters and time) are all between 0 and 1. It is not enough to impose constraints on original parameters p1, p2, mu1 and mu2. I also show you how to do a likelihood ratio test with the results from spg. You can also do the same for nlminb or Rvmmin. Finally, I also show you how to use optimx to compare different algorithms. This shows you that in addition to spg, you also get very good results (as seen from objective function values and KKT conditions) with a number of other algorithms (e.g., Rvmmin, nlminb), many of which are much faster than spg. This example illustrates the utility of optimx(). I am always surprised as to why more R users doing optimization are not using optimx. This is a very powerful for benchmarking unconstrained and box-constrained optimization problems. It deserves to be used widely, in my biased, but correct, opinion. Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constraint Optimization with constrOptim
Hi Raimund, You can use spg() in the BB package. This requires that you have a projection rule for projecting an infeasible point onto the simplex defined by: 0 <= x <= 1, sum(x) = 1. I show you here how to project onto the unit simplex. So, this is how your problem can be solved: require(BB) proj.simplex <- function(y, c=1) { # # project an n-dim vector y to the simplex S_n # S_n = { x | x \in R^n, 0 <= x <= c, sum(x) = c} # Copyright: Ravi Varadhan, Johns Hopkins University # August 8, 2012 # n <- length(y) sy <- sort(y, decreasing=TRUE) csy <- cumsum(sy) rho <- max(which(sy > (csy - c)/(1:n))) theta <- (csy[rho] - c) / rho return(pmax(0, y - theta)) } #Rendite - data dataset <- matrix(c(0.019120, 0.033820, -0.053180, 0.036220, -0.021480, -0.029380, -0.012180, -0.076980, -0.060380, 0.038901, -0.032107, -0.034153, -0.031944, 0.006494, -0.021062, -0.058497, 0.050473, 0.026086, 0.004916, -0.015996, 0.003968, 0.030721, 0.034774, -0.004972, 0.019459, 0.000196, -0.001849), nrow=3, ncol=9, byrow=TRUE) #Computation of the variance-covariance matrix covarianceMatrix <- ((t(dataset)) %*% dataset) / 3 fkt <- function(x, covMat) {t(x) %*% covMat %*% x} startingEstimates = rep(1/9, 9) ans <- spg(startingEstimates, fn=fkt, project=proj.simplex, covMat = covarianceMatrix) ans sum(ans$par) ans <- spg(runif(9), fn=fkt, project=proj.simplex, covMat = covarianceMatrix) # non-uniqueness of the optimal solution ans sum(ans$par) # You see that there is an infinite number of solutiions. Because your covariance matrix is rank-deficient, it only has rank 3. # You get a unique solution when the covariance Matrix is positive-definite aMat <- matrix(rnorm(81), 9, 9) covMat <- aMat %*% t(aMat) ans <- spg(runif(9), fn=fkt, project=proj.simplex, covMat=covMat) # non-uniqueness of the ans sum(ans$par) Hope this is helpful, Ravi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coxph not converging with continuous variable
Dear Terry, I agree that this example is highly atypical. Having said that, my experience with optimization algorithms is that scaling (a.k.a. standardizing) the continuous covariates is greatly helpful in terms of convergence. Have you considered automatically standardizing the continuous covariates, and then converting the scaled coefficients back to the original scale? Of course, the end user could do this just as easily! Best, Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A contour plot question - vis.gam () function in "mgcv"
Thank you, Peter. I don't know why I didn't think of this! Also, thanks to Ilai. Ravi From: Peter Ehlers [ehl...@ucalgary.ca] Sent: Tuesday, April 03, 2012 8:22 PM To: ilai Cc: Ravi Varadhan; r-help@r-project.org Subject: Re: [R] A contour plot question - vis.gam () function in "mgcv" On 2012-04-03 15:49, ilai wrote: > Try to plot the points first followed by vis.gam(...,type='contour', > color='bw', add=T) instead of vis.gam followed by points. > > HTH Or, if vis.gam gives you default scales that you wish to preserve, then just replot the contours over the points with vis.gam(., add = TRUE) Peter Ehlers > > > On Tue, Apr 3, 2012 at 2:48 PM, Ravi Varadhan wrote: >> Hi, >> >> Please see the attached contour plot (I am sorry about the big file). This >> was created using the vis.gam() function in "mgcv" package. However, my >> question is somewhat broader. >> >> In generating this figure, I first created the contours using vis.gam() and >> then I plotted the points. These point are plotted on top of the contours >> so that some of the contour lines are only partially visible. Is there a >> way to make the contour lines fully visible? I cannot reverse the order and >> plot the points first and then call vis.gam(). Or, can I? Are there other >> options? >> >> Thanks for any help or hints. >> >> Best, >> Ravi >> >> >> Ravi Varadhan, Ph.D. >> Assistant Professor >> The Center on Aging and Health >> Division of Geriatric Medicine& Gerontology >> Johns Hopkins University >> rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> >> 410-502-2619 >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quadratic programming-maximization instead of
Maximizing f(x) = x'Ax makes sense only when A is negative-definite. Therefore, this is the same as minimizing x'Bx, where B = -A, and B is positive-definite. In other words, you should be able to simply flip the sign of the original matrix . This should yield a positive-definite matrix since the original matrix ought to be negative-definite. Ravi --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem of "constrOptim.nl", no hessian and convergence
Hi, Use the `auglag' function in "alabama" if you want to get the Hessian at convergence. This typically tends to perform better than `constrOptim.nl'. Also, `constrOptim.nl' does not compute the Hessian. You should not specify method="L-BFGS-B". The default method "BFGS" is better in this setting. Hope this helps, Ravi ------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-negativity constraints for logistic regression
Hi Thomas, Using box-constrained optimizer in glm.fit is a good suggestion for finding the point estimates. However, there is still the issue of making inference, i.e., computing the variances and p-values for the estimates. You have to deal with the issue of MLE possibly being on the boundary. Asymptotic distribution of MLE estimators will not be normal in the case of convergence at the boundary. This is a difficult problem. Best, Ravi --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] performance of adaptIntegrate vs. integrate
The integrand is highly peaked. It is approximately an impulse function where much of the mass is concentrated at a very small interval. Plot the function and see for yourself. This is the likely cause of the problem. Other types of integrands where you could experience problems are: integrands with singularity at either limit and slowly decaying oscillatory integrands. As to why integrate performs better than adaptIntegrate in this situation, I don't know. You have to study the two implementations. Wynn's epsilon algorithm is an extrapolation method for improving the convergence of a sequence. This could be an explanation for the better performance, but I cannot say for sure. Hope this is helpful, Ravi --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optim seems to be finding a local minimum
Hi Dimitri, Your problem has little to do with local versus global optimum. You can convince yourself that the solution you got is not even a local optimum by checking the gradient at the solution. The main issue is that your objective function is not differentiable everywhere. So, you have 2 options: either you use a smooth objective function (e.g. squared residuals) or you use an optimization algorithm than can handle non-smooth objective function. Here I show that your problem is well solved by the `nmkb' function (a bound-constraints version of Nelder-Mead simplex method) from the "dfoptim" package. library(dfoptim) > myopt2 <- nmkb(fn=myfunc, par=c(0.1,max(IV)), lower=0) > myopt2 $par [1] 8.897590e-01 9.470163e+07 $value [1] 334.1901 $feval [1] 204 $restarts [1] 0 $convergence [1] 0 $message [1] "Successful convergence" Then, there is also the issue of properly scaling your function, because it is poorly scaled. Look how different the 2 parameters are - they are 7 orders of magnitude apart. You are really asking for trouble here. Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of "Optim"
Hi, You really need to study the documentation of "optim" carefully before you make broad generalizations. There are several algorithms available in optim. The default is a simplex-type algorithm called Nelder-Mead. I think this is an unfortunate choice as the default algorithm. Nelder-Mead is a robust algorithm that can work well for almost any kind of objective function (smooth or nasty). However, the trade-off is that it is very slow in terms of convergence rate. For simple, smooth problems, such as yours, you should use "BFGS" (or "L-BFGS" if you have simple box-constraints). Also, take a look at the "optimx" package and the most recent paper in J Stat Software on optimx for a better understanding of the wide array of optimization options available in R. Best, Ravi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Download statistics for a package
Hi, How can I get information on how many times a particular package has been downloaded from CRAN? Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Global optimisation with inequality constraints
Hi, Take a look at the multiStart() function in the "BB" package. This allows you to specify random multiple starting values, which need not satisfy constraints. Then you need to specify the constraints using the `projectLinear' argument. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Imposing Feller condition using project constraint in spg
Hi Kristian, The idea behind projection is that you take an iterate that violates the constraints and project it onto a point such that it is the nearest point that satisfies the constraints. Suppose you have an iterate (w1, w4) that does not satisfy the constraint that w1 * w4 != (1 + eps)/2. Our goal is to find a (w1', w2'), given (w1, w2), such that (A) w1' * w2' = (1+eps)/2 = k (B) (w1-w1')^2 + (w2-w2')^2 is minimum. This is quite easy to solve. We know (w1, w2). You plug in w2' = k/w1' from (A) into (B) and minimize the function of w1'. This is a simple calculus exercise, and I will leave this as a homework problem for you to solve! Best, Ravi. ------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hessian matrix issue
"However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid." Let me rephrase this. I think that as a measure of dispersion, the standard error obtained using the augmented Lagrangian algorithm is correct. However, what is *not known* is the asymptotic distribution of the parameter estimates when constraints are active. This is a "non-regular" situation where MLEs might have strange asymptotic behavior. We cannot generally assume normality and use the standard error estimates to construct confidence intervals or calculate significance levels. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> From: Ravi Varadhan Sent: Wednesday, September 07, 2011 1:37 PM To: 'gpet...@uark.edu'; 'nas...@uottawa.ca'; 'r-help@r-project.org' Subject: Hessian matrix issue Yes, numDeriv::hessian is very accurate. So, I normally take the output from the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to it. I, then, compute the standard errors from it. However, it is important to know that you have obtained a local optimum. In fact, you need the gradient and Hessian to actually verify this - the first and second order KKT conditions. However, this is tricky in *constrained* optimization problems. If you have constraints, and at least one of the constraints is active at the best parameter estimate, then the gradient of the original objective function need not be close to zero and the hessian is also incorrect. If you employ an augmented Lagrangian approach (see alabama::auglag), then you can obtain the correct gradient and Hessian at best parameter estimate. These gradients and Hessian correspond to the modified objective function that includes a Lagrangian term augmented by a quadratic penalty term. They can be used to check the KKT conditions. Here is a simple example: require(alabama) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } p0 <- c(0, 0) ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf), method="L-BFGS-B", hessian=TRUE) grr(ans1$par) # this will not be close to zero ans1$hessian # Using an augmented Lagrangian optimizer hcon <- function(x) 0.9 - x[1] hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2) ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac) ans2$gradient # this will be close to zero ans2$hessian However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hessian matrix issue
Yes, numDeriv::hessian is very accurate. So, I normally take the output from the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to it. I, then, compute the standard errors from it. However, it is important to know that you have obtained a local optimum. In fact, you need the gradient and Hessian to actually verify this - the first and second order KKT conditions. However, this is tricky in *constrained* optimization problems. If you have constraints, and at least one of the constraints is active at the best parameter estimate, then the gradient of the original objective function need not be close to zero and the hessian is also incorrect. If you employ an augmented Lagrangian approach (see alabama::auglag), then you can obtain the correct gradient and Hessian at best parameter estimate. These gradients and Hessian correspond to the modified objective function that includes a Lagrangian term augmented by a quadratic penalty term. They can be used to check the KKT conditions. Here is a simple example: require(alabama) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } p0 <- c(0, 0) ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf), method="L-BFGS-B", hessian=TRUE) grr(ans1$par) # this will not be close to zero ans1$hessian # Using an augmented Lagrangian optimizer hcon <- function(x) 0.9 - x[1] hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2) ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac) ans2$gradient # this will be close to zero ans2$hessian However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid. Best, Ravi. ------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradients in optimx
Hi Reuben, I am puzzled to note that the gradient check in "optimx" does not work for you. Can you send me a reproducible example so that I can figure this out? John - I think the best solution for now is to issue a "warning" rather than an error message, when the numerical gradient is not sufficiently close to the user-specified gradient. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradient function in OPTIMX
Hi Kathie, The gradient check in "optimx" checks if the user specified gradient (at starting parameters) is within roughly 1.e-05 * (1 + fval) of the numerically computed gradient. It is likely that you have correctly coded up the gradient, but still there can be significant differences b/w numerical and exact gradients. This can happen when the gradients are very large. I would check this again separately as follows: require(numDeriv) mygrad <- gr.fy(theta0) numgrad <- grad(x=theta0, func=gr.fy) cbind(mygrad, numgrad) all.equal(mygrad, numgrad) Can you report these gradients to us? In "optimx", we should probably change this into a "warning" rather than a "stop". Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate a random variate that is correlated with a given right-censored random variate?
Hi, Here is an update. I implemented this bivariate lognormal approach. It works well in simulations when I generated the marginal [T] from a lognormal distribution and independently censored it. It, however, does not do well when I generate from a marginal [T] that is Weibull or a distribution not well approximated by a lognormal. How to make the approach more robust to distributional assumptions on the marginal of [T]? I am thinking that a bivariate copula approach is called for. The [U] margin can be standard lognormal as before, and the [T] margin needs to be a flexible distribution. What kind of bivariate coupla might work? Then, how to generate from the conditional distribution [U | T]? Any thoughts? Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> From: Ravi Varadhan Sent: Friday, August 26, 2011 2:56 PM To: r-help@r-project.org Subject: How to generate a random variate that is correlated with a given right-censored random variate? Hi, I have a right-censored (positive) random variable (e.g. failure times subject to right censoring) that is observed for N subjects: Y_i, I = 1, 2, ..., N. Note that Y_i = min(T_i, C_i), where T_i is the true failure time and C_i is the censored time. Let us assume that C_i is independent of T_i. Now, I would like to generate another random variable U_i, I = 1, 2, ..., N, which is correlated with T. In other words, I would like to generate U from the conditional distribution [U | T=t]. One approach might be to assume that the joint distn [T, U] is bivariate lognormal. So, the marginals [T] and [U], as well as the conditional [U | T] are also lognormal. I can estimate the marginal [T] using the right-censored data Y (assuming independent censoring). For example, I might use survival::survreg to do this. Then, I assume that U is standard lognormal (mean = 0, var = 1). Now, I only need to assume a value for correlation parameter, r, and I can then sample from the conditional [U | T=t] which is also a lognormal (parametrized by r). Does this sound right? Are there better/simpler ways to do this? Thanks very much for any hints. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to generate a random variate that is correlated with a given right-censored random variate?
Hi, I have a right-censored (positive) random variable (e.g. failure times subject to right censoring) that is observed for N subjects: Y_i, I = 1, 2, ..., N. Note that Y_i = min(T_i, C_i), where T_i is the true failure time and C_i is the censored time. Let us assume that C_i is independent of T_i. Now, I would like to generate another random variable U_i, I = 1, 2, ..., N, which is correlated with T. In other words, I would like to generate U from the conditional distribution [U | T=t]. One approach might be to assume that the joint distn [T, U] is bivariate lognormal. So, the marginals [T] and [U], as well as the conditional [U | T] are also lognormal. I can estimate the marginal [T] using the right-censored data Y (assuming independent censoring). For example, I might use survival::survreg to do this. Then, I assume that U is standard lognormal (mean = 0, var = 1). Now, I only need to assume a value for correlation parameter, r, and I can then sample from the conditional [U | T=t] which is also a lognormal (parametrized by r). Does this sound right? Are there better/simpler ways to do this? Thanks very much for any hints. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scaling problem in optim()
There is an error in your call to `optim'. The`lower' bound is incorrect: lower=c(rep(-1,n),1e-5,1e-5,1e5) This should be: lower=c(rep(-1,n),1e-5,1e-5,1e-5) I am not sure if this fixes the problem, but it is worth a try. I do not understand your scaling. From the lower and upper bounds it seems like (n+2)th parameter is the only one that needs to be scaled by a scale factor of 100. After you have fixed the error in `lower' and the scaling, you might also want to try "optimx" package that runs a bunch of optimization algorithms. Ravi. ------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimization problems
Kathie, It is very difficult to help without adequate information. What does your objective function look like? Are you maximizing (in which case you have to make sure that the sign of the objective function is correct) or minimizing? Can you try "optimx" with the control option all.methods=TRUE? Hope this is helpful, Ravi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ridge regression - covariance matrices of ridge
Hi Michael, The coefficients of ridge regression are given by: \beta^* = (X'X + k I)^{-1} X' y, (1) where k > 0 is the penalty parameter and I is the identity matrix. The ridge estimates are related to OLS estimates \beta as follows: \beta^* = Z \beta, -- (2) where Z = [I + k(X'X)^{-1}]^{-1} , -- (3) Let \Sigma and \Sigma^* be the variance-covariance matrices of \beta and \beta^*, resply. Therefore, \Sigma^* = Z \Sigma Z' - (4) In other words, for a fixed k, you can obtain the covariance matrix of \beta^* by making use of (3) and (4). You can read the original paper by Hoerl and Kennard (Technometrics 1970) for more details. Hope this is helpful, Ravi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to capture console output in a numeric format
I did think of this solution, Keith, but I am generally uncomfortable (may be "paranoid" is a better word) with the use of `<<-'. Perhaps, my fear is unjustified in this particular situation. Thanks, Ravi. ------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Keith Jewell Sent: Friday, June 24, 2011 11:49 AM To: r-h...@stat.math.ethz.ch Subject: Re: [R] How to capture console output in a numeric format If you don't want the information as character, why are you printing it rather than storing it in a matrix? Why not something along the lines of this... fr <- function(x) { ## Rosenbrock Banana function on.exit(aMatrix <<- rbind(aMatrix,(cbind(x1, x2, f x1 <- x[1] x2 <- x[2] f <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } aMatrix <- NULL ans <- optim(c(-1.2,1), fr) aMatrix HTH Keith J -Original Message- "Ravi Varadhan" wrote in message news:2f9ea67ef9ae1c48a147cb41be2e15c303e...@dom-eb-mail1.win.ad.jhu.edu... Thank you very much, Jim. That works! I did know that I could process the character strings using regex, but was also wondering if there was a direct way to get this. Suppose, in the current example I would like to obtain a 3-column matrix that contains the parameters and the function value: fr <- function(x) { ## Rosenbrock Banana function on.exit(print(cbind(x1, x2, f))) x1 <- x[1] x2 <- x[2] f <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } fvals <- capture.output(ans <- optim(c(-1.2,1), fr)) Now, I need to tweak your solution to get the 3-column matrix. It would be nice, if there was a more direct way to get the numerical output, perhaps a numeric option in capture.output(). Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to capture console output in a numeric format
Thank you very much, Jim. That works! I did know that I could process the character strings using regex, but was also wondering if there was a direct way to get this. Suppose, in the current example I would like to obtain a 3-column matrix that contains the parameters and the function value: fr <- function(x) { ## Rosenbrock Banana function on.exit(print(cbind(x1, x2, f))) x1 <- x[1] x2 <- x[2] f <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } fvals <- capture.output(ans <- optim(c(-1.2,1), fr)) Now, I need to tweak your solution to get the 3-column matrix. It would be nice, if there was a more direct way to get the numerical output, perhaps a numeric option in capture.output(). Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Friday, June 24, 2011 10:48 AM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] How to capture console output in a numeric format try this: > fr <- function(x) { ## Rosenbrock Banana function +on.exit(print(f)) +x1 <- x[1] +x2 <- x[2] +f <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 +f + } > > fvals <- capture.output(ans <- optim(c(-1.2,1), fr)) > # convert to numeric > fvals <- as.numeric(sub("^.* ", "", fvals)) > > fvals [1] 24.20 7.095296 15.08 4.541696 [5] 6.029216 4.456256 8.879936 7.777856 [9] 4.728125 5.167901 4.21 4.437670 [13] 4.178989 4.326023 4.070813 4.221489 [17] 4.039810 4.896359 4.009379 4.077130 [21] 4.020798 3.993600 4.024586 4.117625 [25] 3.993115 3.976081 3.971089 4.023905 [29] 3.980807 3.9525770000 3.932179 3.935345 On Fri, Jun 24, 2011 at 10:39 AM, Ravi Varadhan wrote: > Hi, > > I would like to know how to capture the console output from running an > algorithm for further analysis. I can capture this using capture.output() > but that yields a character vector. I would like to extract the actual > numeric values. Here is an example of what I am trying to do. > > fr <- function(x) { ## Rosenbrock Banana function > on.exit(print(f)) > x1 <- x[1] > x2 <- x[2] > f <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 > f > } > > fvals <- capture.output(ans <- optim(c(-1.2,1), fr)) > > Now, `fvals' contains character elements, but I would like to obtain the > actual numerical values. How can I do this? > > Thanks very much for any suggestions. > > Best, > Ravi. > > --- > Ravi Varadhan, Ph.D. > Assistant Professor, > Division of Geriatric Medicine and Gerontology School of Medicine Johns > Hopkins University > > Ph. (410) 502-2619 > email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to capture console output in a numeric format
Hi, I would like to know how to capture the console output from running an algorithm for further analysis. I can capture this using capture.output() but that yields a character vector. I would like to extract the actual numeric values. Here is an example of what I am trying to do. fr <- function(x) { ## Rosenbrock Banana function on.exit(print(f)) x1 <- x[1] x2 <- x[2] f <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } fvals <- capture.output(ans <- optim(c(-1.2,1), fr)) Now, `fvals' contains character elements, but I would like to obtain the actual numerical values. How can I do this? Thanks very much for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.