[R] problem with nls starting values
Hi I would like to fit a non-linear regression to the follwoing data: quantiles-c(seq(.05,.95,0.05)) slopes-c( 0.00e+00, 1.622074e-04 , 3.103918e-03 , 2.169135e-03 , 9.585523e-04 ,1.412327e-03 , 4.288103e-05, -1.351171e-04 , 2.885810e-04 ,-4.574773e-04 , -2.368968e-03, -3.104634e-03, -5.833970e-03, -6.011945e-03, -7.737697e-03 , -8.203058e-03, -7.809603e-03, -6.623985e-03, -9.414477e-03) plot(slopes~quantiles) I want to fit two models: asymptotic decay and logistic decay(s-shaped). I tried self-starting functions (SSlogis and SSasymp) like this: dframe-data.frame(cbind(slopes,quantiles)) names(dframe)-c(slopes,quantiles) summary(mod1-nls(slopes ~ SSlogis( quantiles, Asym, xmid, scal),data=dframe)) summary(mod1-nls(slopes ~ SSasymp( quantiles, Asym, resp0, lrc),data=dframe)) and I tried to specify the starting values myself. But I usually don't even get the nls started. It's always some singular gradient error or some other related error message (stopped after 50 iterations,etc.). When I leave out some values from the middle quantiles I manage to fit a 3-parameter logistic model, but if I use all the values it doesn't work any longer. Then I simulated perfect asymptotic decay data and tried to to fit an nls() with the correct parameter values, but it won't work either. What am I doing wrong? Any help would be most appreciated Best benedikt -- Benedikt Gehr Ph.D. Student Institute of Evolutionary Biology and Environmental Studies University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Office 13 J 36b Phone: +41 (0)44 635 49 72 http://www.ieu.uzh.ch/staff/phd/gehr.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with nls starting values
thanks for your reply I agree that an lm model would fit just as well, however the expectation from a mechanistic point of view would be a non-linear relationship. Also when I simulate data as in y_val-115-118*exp(-0.12*(seq(1,100)+rnorm(100,0,0.8))) x_val-seq(1:100) plot(y_val~x_val) summary(mod1-nls(y_val~a-b*exp(-c*x_val),start=list(a=115,b=118,c=0.12))) I do not get a convergence. I obviously must be doing something wrong. Thanks for the help Benedikt Am 27.09.2012 20:00, schrieb Bert Gunter: My guess: You probably are overfitting your data. A straight line does about as well as anything except for the 3 high leverage points, which the minimization is probably having trouble with. -- Bert On Thu, Sep 27, 2012 at 10:43 AM, Benedikt Gehr benedikt.g...@ieu.uzh.ch wrote: quantiles-c(seq(.05,.95,0.05)) slopes-c( 0.00e+00, 1.622074e-04 , 3.103918e-03 , 2.169135e-03 , 9.585523e-04 ,1.412327e-03 , 4.288103e-05, -1.351171e-04 , 2.885810e-04 ,-4.574773e-04 , -2.368968e-03, -3.104634e-03, -5.833970e-03, -6.011945e-03, -7.737697e-03 , -8.203058e-03, -7.809603e-03, -6.623985e-03, -9.414477e-03) plot(slopes~quantiles) -- Benedikt Gehr Ph.D. Student Institute of Evolutionary Biology and Environmental Studies University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Office 13 J 36b Phone: +41 (0)44 635 49 72 http://www.ieu.uzh.ch/staff/phd/gehr.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with nls starting values
now I feel very silly! I swear I was trying this for a long time and it didn't work. Now that I closed R and restarted it it works also on my machine. So is the only problem that my model is overparametrized with the data I have? however shouldn't it be possible to fit an nls to these data? thanks for the help Am 27.09.2012 21:27, schrieb Berend Hasselman: y_val-115-118*exp(-0.12*(seq(1,100)+rnorm(100,0,0.8))) x_val-seq(1:100) plot(y_val~x_val) summary(mod1-nls(y_val~a-b*exp(-c*x_val),start=list(a=115,b=118,c=0.12))) -- Benedikt Gehr Ph.D. Student Institute of Evolutionary Biology and Environmental Studies University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Office 13 J 36b Phone: +41 (0)44 635 49 72 http://www.ieu.uzh.ch/staff/phd/gehr.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] order a data frame by date with order
Hi I have a rather large data frame (7000 rows with 28 columns) which I want to sort by date. Below I have a example of the data frame. The Date column is called DT, is a factor and looks like this: class(res.merge$DT) [1] factor head(res.merge$DT) [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012 15:01:15 [5] 17.3.2012 15:32:14 17.3.2012 16:01:29 2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50 res.merge is the data frame unordered. Now I want to order the data frame with: res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y %H:%M:%S)),] This works in fact, however for some reason there are always two entires that go at the end of the data frame for no obvious reason (see below, 09.05.2012 ist the most recent date). And this is the case for different data.frames. The two entries at the end are always 25.3.2012 02:00:xx and 25.3.2012 02.30.xx. Can anybody tell me what the problem is? Any help is most appreciated. Best Benedikt res.ordered[2545:2549,] DT Typ NOD Day_s DOW_s Time_s Long Lat 2547 9.5.2012 14:30:56 GPS 1893 9.5.2012We 14:30:00 7.452218 46.43579 2548 9.5.2012 15:02:09 GPS 1893 9.5.2012We 15:00:35 7.451983 46.43583 2549 9.5.2012 15:30:50 GPS 1893 9.5.2012We 15:30:00 7.451973 46.43597 1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414 1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e Time_e BV Temp SOG 2547 1182.8 3 A 1 143 55 9.5.2012We 14:30:56 3735 31 0.09 2548 1182.8 3 A 1 143 94 9.5.2012We 15:02:09 3637 32 0.02 2549 1176.5 3 A 1 143 50 9.5.2012We 15:30:50 3730 29 0.17 1845 1295.2 3 A 1 151 17 25.3.2012So 02:00:18 37157 0.18 1846 1287.3 3 A 1 144 16 25.3.2012So 02:30:16 37208 0.14 Heading SAE HAE BW_2 BW_3 X.. 2547 24.90 3.81 9.47 3666 3625 9.08 25487.86 0.51 7.17 3593 3586 9.11 2549 344.72 2.86 4.10 3662 3623 9.12 1845 335.54 3.53 5.63 3618 3618 0.81 1846 75.37 5.44 8.96 3618 3618 0.81 -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] order a data frame by date with orderl
Hi, many thanks for your answer. if i set tz=GMT it does the job. Great! thanks cheers Benedikt Am 16.05.2012 12:20, schrieb jholtman [via R]: Is the a daylight saving time problem? Check your timezone and see when it occurred; these times might not be legal. Sent from my iPad On May 16, 2012, at 3:27, Benedikt Gehr [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=0 wrote: Hi I have a rather large data frame (7000 rows with 28 columns) which I want to sort by date. Below I have a example of the data frame. The Date column is called DT, is a factor and looks like this: class(res.merge$DT) [1] factor head(res.merge$DT) [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012 15:01:15 [5] 17.3.2012 15:32:14 17.3.2012 16:01:29 2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50 res.merge is the data frame unordered. Now I want to order the data frame with: res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y %H:%M:%S)),] This works in fact, however for some reason there are always two entires that go at the end of the data frame for no obvious reason (see below, 09.05.2012 ist the most recent date). And this is the case for different data.frames. The two entries at the end are always 25.3.2012 02:00:xx and 25.3.2012 02.30.xx. Can anybody tell me what the problem is? Any help is most appreciated. Best Benedikt res.ordered[2545:2549,] DT Typ NOD Day_s DOW_s Time_s Long Lat 2547 9.5.2012 14:30:56 GPS 1893 9.5.2012We 14:30:00 7.452218 46.43579 2548 9.5.2012 15:02:09 GPS 1893 9.5.2012We 15:00:35 7.451983 46.43583 2549 9.5.2012 15:30:50 GPS 1893 9.5.2012We 15:30:00 7.451973 46.43597 1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414 1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e Time_e BV Temp SOG 2547 1182.8 3 A 1 143 55 9.5.2012We 14:30:56 3735 31 0.09 2548 1182.8 3 A 1 143 94 9.5.2012We 15:02:09 3637 32 0.02 2549 1176.5 3 A 1 143 50 9.5.2012We 15:30:50 3730 29 0.17 1845 1295.2 3 A 1 151 17 25.3.2012So 02:00:18 37157 0.18 1846 1287.3 3 A 1 144 16 25.3.2012So 02:30:16 37208 0.14 Heading SAE HAE BW_2 BW_3 X.. 2547 24.90 3.81 9.47 3666 3625 9.08 25487.86 0.51 7.17 3593 3586 9.11 2549 344.72 2.86 4.10 3662 3623 9.12 1845 335.54 3.53 5.63 3618 3618 0.81 1846 75.37 5.44 8.96 3618 3618 0.81 -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225p4630229.html To unsubscribe from order a data frame by date with order, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4630225code=YmVuZWRpa3QuZ2VockBpZXUudXpoLmNofDQ2MzAyMjV8LTc4NzA5MjQxMQ==. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Benedikt Gehr Ph.D. Student Institute of Evolutionary Biology and Environmental Studies University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Office 13 J 36b Phone: +41 (0)44 635 49 72 http://www.ieu.uzh.ch/staff/phd/gehr.html -- View this message in context: http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225p4630237.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https
[R] true time series lags behind fitted values in arima model
Hi I am fitting an arima model to some time series X. When I was comparing the fitted values of the model to the true time series I realized that the true time series lags one time step behind the fitted values of the arima model. And this is the case for any model. When I did a simple linear regression using lm to check, I also find the same results, that the true series lags behind the fitted values. I don't understand this, what am I doing wrong? Below I copied some of the code to demonstrate the issue. ## Analysis using arima() X-c(6.841067, 6.978443, 6.984755, 7.007225, 7.161198, 7.169790, 7.251534, # the true time series 7.336429, 7.356600, 7.413271, 7.404165, 7.480869, 7.498686, 7.429809, 7.302747, 7.168251, 7.124798, 7.094881, 7.119132, 7.049250, 6.961049, 7.013442, 6.915243, 6.758036, 6.665078, 6.730523, 6.702005, 6.905522, 7.005191, 7.308986) model100-arima(X,order=c(1,0,0),include.mean=T,method=ML) # the arima model resid100-residuals(model100) Xfit100-X-resid100 # the fitted values ts.plot(cbind(Xfit100,X),col=c(2,4),main=ARIMA(1,0,0)) # plot the true ts vs the fitted values legend(20,7.5,c(true values,fitted values),pch=c(16,16),col=c(blue,red)) ## Same analysis using lm() Y-X[2:30] # create X(t)- a time series without the first value X1-X[1:29] # create the X(t-1) time-seq(1978,2006) model1-lm(Y~X1) summary(model1) arima.intercept-model1[[1]][[1]]/(1-model1[[1]][[2]]);arima.intercept model100[[1]][[2]] plot(Y~time,type=b) # plot the two series points(fitted(model1)~time,type=b,col=green) legend(1995,7.5,c(true values,fitted values),pch=c(16,16),col=c(black,green)) Thank you very much for the help best wishes Benedikt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with arima() function
Hi I posted this problem yesterday but didn't get a reply so I try again today. I hope someone can help me with this. thank you very much for the help cheers Benedikt I would like to use arima () to find the best arima model for a time series. The default in arima apparently is to use conditional sum of squares to find the starting values and then use ML for the rest (as was described on the help page). Now, using the default may lead to error messages saying: non-stationary ar part in CSS. When changeing the default from CSS-ML to ML-only the minimization works. As far as I understand, arima doesn't require stationarity, but apparently CSS does. Can anyone tell me what exactly the css method does? And why is CSS-ML the default in R? Out of efficiency reasons? Because ML and ML-CSS gives the exact same estimates when applied to the same data. I tried to find out on google but I couldnt' find anything usefull or understandable to me as a non-statistician. Here some data that causes the error message: X-c(6.841067, 6.978443, 6.984755, 7.007225, 7.161198, 7.169790, 7.251534, 7.336429, 7.356600, 7.413271, 7.404165, 7.480869, 7.498686, 7.429809, 7.302747, 7.168251, 7.124798, 7.094881, 7.119132, 7.049250, 6.961049, 7.013442, 6.915243, 6.758036, 6.665078, 6.730523, 6.702005, 6.905522, 7.005191, 7.308986) model.examp-arima(X,order=c(7,0,0),include.mean=T)# gives an error model.examp-arima(X,order=c(7,0,0),include.mean=T,method=ML) # gives no error Any help on this would be most appreciated Many thanks fo the help best wishes Benedikt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] non-stationary ar part in css
Hi I would like to use arima () to find the best arima model for y time series. The default in arima apparently is to use conditional sum of squares to find the starting values and then ML (as described on the help page). Now using the default may lead to error messages saying: non-stationary ar part in CSS. When changeing the default to ML only the minimization works. As far as I understand, arima doesn't require stationarity, but apparently CSS does. Can anyone tell me what exactly the css method does? And why is CSS-ML the default in R? Out of efficiency reasons? Because ML and ML-CSS gives the exact same estimates when applied to the same data. I tried to find out on google but I couldnt' find anything usefull or understandable to me as a non-statistician. Here some data that causes the error message: X-6.841067, 6.978443, 6.984755, 7.007225, 7.161198, 7.169790, 7.251534, 7.336429, 7.356600, 7.413271, 7.404165, 7.480869, 7.498686, 7.429809, 7.302747, 7.168251, 7.124798, 7.094881, 7.119132, 7.049250, 6.961049, 7.013442, 6.915243, 6.758036, 6.665078, 6.730523, 6.702005, 6.905522, 7.005191, 7.308986) model.examp-arima(X,order=c(7,0,0),include.mean=T)# gives an error model.examp-arima(X,order=c(7,0,0),include.mean=T,method=ML) # gives no error Any help on this would be most appreciated Many thanks fo the help best wishes Benedikt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] the arima()-function and AICc
Hi I'm trying to fit arima models with the arima() function and I have two questions. ## ##1. ## ## I have n observations for my time series. Now, no matter what arima(p,d,q)- model I fit, I always get n residuals. How is that possible? For example: If I try this out myself on an AR(1) and calculate the fitted values from the estimated coefficients I can calculate n-1 residuals. What is the first residual - residuals[1] in the model? Does it have something to do with the mean? Here is what I did: - X is the time series I'm analysing X-c(6.770705, 6.842524, 6.881832, 6.896694, 7.004967, 7.065750, 7.139447, 7.227818, 7.274945, 7.333097, 7.350763, 7.404271, 7.426247, 7.394454, 7.303650, 7.176984, 7.170972, 7.113736, 7.154326, 7.136678, 7.103826, 7.146775, 7.084247, 7.016302, 6.784539, 6.705846, 6.709989, 6.851557, 6.973064, 7.232223) ## The AR(1) Model model10-arima(X,order=c(1,0,0),include.mean=T) mu-model10[[1]][[2]] a-model10[[1]][[1]] ## Get the fitted values and residuals of the arima(1,0,0)-model fitted-vector(mode=numeric) E-vector(mode=numeric) for (i in 2:30){ fitted[i]-a*(X[i-1]-mu)+mu E[i]-X[i]-fitted[i] } fitted E# Innovations residuals(model10)# Compare with the residuals from the arima() model ## ##2. ## ## I want to calculate the AICc Value for model selection. Is there a way to calculate the AICc from the model output without doing it manually. I guess I could write a function myself somehow but it's always nice to have an inbuilt function with which to compare what one does. I know that there has been some discussion on this on the platform but I didn't find anything that helped me. I tried AICctab from the bbmle package but I couldn't figure out how to get my output from arima into the function. I tried with making a list of the logLik(mdoel) output and several other things but it never worked. Any help with this is most appreciated many thanks in advance cheers Beni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: an alternative to R for nonlinear stat models
Hi Paul Oh ok sorry, I send you below a copy of the R code I was using. It's very possible that my programming is slow as well. I'm very happy to learn. Sorry the code is a bit long, but its structured like this and cotains three main functions and the optimization: 1. Input - (I didn't send the data) 2. Function to calculate the cell probabilities for the multinomial likelihood 3. Function to calculate the multinomial 4.Function to be optimized 5. then I set the starting values 6. optimization using mle2 hope that helps cheers Beni -Paul Hiemstra p.hiems...@geo.uu.nl schrieb: - An: benedikt.g...@ieu.uzh.ch Von: Paul Hiemstra p.hiems...@geo.uu.nl Datum: 16.06.2010 09:36 Kopie: r-help@r-project.org, da...@otter-rsch.com Betreff: Re: [R] an alternative to R for nonlinear stat models On 06/16/2010 07:35 AM, benedikt.g...@ieu.uzh.ch wrote: Hi I implemented the age-structure model in Gove et al (2002) in R, which is a nonlinear statistical model. However running the model in R was very slow. So Dave Fournier suggested to use the AD Model Builder Software package and helped me implement the model there. ADMB was incredibly fast in running the model: While running the model in R took 5-10 minutes, depending on the settings, in ADMB it took 1-2 seconds! I'm reporting this so that people who have performance issues with nonlinear statistical models in R will know that there is a good free alternative for more difficult problems. There is also a help platfrom equivalent to the one for R, and people running it are extremley helpful. I hope this might help someone cheers Beni __ R-help@r-project.org mailing list [1]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [2]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi Beni, Thanks for posting information that might be useful for people on the list. The only thing is that without a little more detail on how exactly you implemented things in R, we are left to guess if the performance issues are a problem of R, or that your particular implementation was the problem. There are was of implementing R code in two ways, where the first takes minutes and the second 1-2 seconds. Furthermore, you are giving us no option to defend R ;). cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri [3]http://intamap.geo.uu.nl/~paul [4]http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 References 1. https://stat.ethz.ch/mailman/listinfo/r-help 2. http://www.r-project.org/posting-guide.html 3. http://intamap.geo.uu.nl/~paul 4. http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: an alternative to R for nonlinear stat models
Hi I think my first answer doesn't seem to have gone through due to the attachement. So I copy pasted my last answer including the code below. Sorry about that: Hi Paul Oh ok sorry, I send you below a copy of the R code I was using. It's very possible that my programming is slow as well. I'm very happy to learn. Sorry the code is a bit long, but its structured like this and cotains three main functions and the optimization: 1. Input - (I didn't send the data) 2. Function to calculate the cell probabilities for the multinomial likelihood 3. Function to calculate the multinomial 4.Function to be optimized 5. then I set the starting values 6. optimization using mle2 hope that helps cheers Beni ### ### #library(boot) #library(bbmle) #library(demogR) ##Simulated harvest numbers of a cohort harvested over 30 years (including random noise) pop-N.true # True population size cohort-C # harvest matrix R-nrow(cohort) L-ncol(cohort) h-matrix(rep(Q,R),ncol=a,byrow=T) # estimated harvest rates S-matrix(rep(P[-1],R),ncol=a-1,byrow=T) # survival rates ## ##Function to calculate the cell probabilities ## predprob - function(S=S,h=h) { R-nrow(cohort) L-ncol(cohort) p - matrix(rep(0,(R)*(L)),ncol=L) # matrix of cell probabilities p[R,1]-h[R,1] p[1,L]-h[1,L] ay-rep(0,L+R+1) # vector of last cell probabilities - 1-sum(rest));Gove(2002) ay[L+R]-1-h[1,L] ay[2]-1-h[R,1] index-3 # ay[1]=Null, ay[2]=1-h[R,1],ay[L+R]=1-h[1,L], ay[L+R+1]=Null for (i in -(R-2):(L-2)){# Calculating the cell prob after Gove (2002) p[row(p) == col(p) - i] - odiag(h,i)*c(1,cumprod((1-odiag(h[1:(R-1),1:(L-1)],i)) *odiag(S[1:(R-1),],i))) ay[index]-1-sum(odiag(p[1:R,],i)) index-index+1 } p-rbind(p,ay[1:L]) p-cbind(p,ay[length(ay):(L+1)]) return(p) } predprob(S,h) #test the function ##Function to calculate the multinomial -from Ben Bolker ## multinom WITHOUT ROUNDING without error-checking dmnom - function(x,prob,log=FALSE) { r - lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1)) if (log) r else exp(r) } ### ## function to be optimized ### mfun - function(logN,S=S,h=h,cohort=cohort) { L-ncol(cohort) R-nrow(cohort) d-matrix(rep(NA,(R+1)*(L+1)),ncol=L+1) # last row and last col of the matrix d[1:R,1:L]-log(cohort)# are the values to be optimized-Log tranformed data! index-1 index1-(R+L) for (i in -(R-1):(L-1)){# values for the top row if (i= -(R-L) ){ d[(R+1),(index+1)]-log(exp(logN[index])-sum(odiag(cohort,i)))} else if (i -(R-L) i1 ){ d[index1,(L+1)]-log(exp(logN[index])-sum(odiag(cohort,i)))} else if (i0){ d[index1,(L+1)]-log(exp(logN[index])-sum(odiag(cohort,i)))} index-index+1 index1-index1-1 } pp-predprob(S,h) lhood-0 # The Likelihood function - to be maximized for (i in -(R-1):(L-1)){ lhood-lhood+(-dmnom(exp(odiag(d,i)),prob=odiag(pp,i),log=TRUE)) } return(lhood) } logN-log(rep(1000,(R+L-1))) # Test the function with test-values mfun(logN,S,h,cohort) ## ## ## ##Starting values for the optimization ## N-rep(500,(R+L-1)) # optimization,-N-x1-x2-x3 svec = log(N) # transform to avoid constraints (x0) names(svec) - paste(logN,1:(R+L-1),sep=) parnames(mfun) - names(svec) ## ##Lower bounds for the L-BFGS-B-method ## index-1 lower-rep(0,(R+L-1)) for (i in -(R-1):(L-1)){ lower[index]-log(sum(odiag(cohort,i))) index-index+1 } lower-lower+0.001 names(lower)-names(svec) exp(lower) # check lower bounds # ##Optimization with mle2 # ## use bounded optimization m1 = mle2(mfun,start=svec, method=L-BFGS-B,lower=lower,upper=10, # lower=must be larger than sum(odiag(x))-if smaller NANs are produced vecpar=TRUE, data=list(S=S,h=h,cohort=cohort)) coef(m1) summary(m1) Nest-exp(coef(m1)) # estimated cohort sizes for row and colum 1 -Paul Hiemstra p.hiems...@geo.uu.nl schrieb: - An: benedikt.g...@ieu.uzh.ch Von: Paul Hiemstra p.hiems...@geo.uu.nl
[R] an alternative to R for nonlinear stat models
Hi I implemented the age-structure model in Gove et al (2002) in R, which is a nonlinear statistical model. However running the model in R was very slow. So Dave Fournier suggested to use the AD Model Builder Software package and helped me implement the model there. ADMB was incredibly fast in running the model: While running the model in R took 5-10 minutes, depending on the settings, in ADMB it took 1-2 seconds! I'm reporting this so that people who have performance issues with nonlinear statistical models in R will know that there is a good free alternative for more difficult problems. There is also a help platfrom equivalent to the one for R, and people running it are extremley helpful. I hope this might help someone cheers Beni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running admb from R using system()
Hi I'm trying to run an admb model from R by using the system () command. The admb model runs fine when running it from the admb command line or when using emacs. However when I try it with system() then R crashes every time. And I tried using the R command line and RGui and in both it crashes. I also tried it from different computers. What I do is I first set the directory where I keep the admb template file with the corresponding admb data file and then invoke the system command either step by step or directly. I can build the model but when executing the model.exe file R crashes. ## setting the working directory setwd(Documents and Settings\\Beni User\\Desktop\\Transfer\\Model Fournier\\admb models\\simulated data\\stochastic\\pooled age classes) building the model - this works fine system('makeadm stocpool') running the model - makes R crash system('stoc.exe') ### or directly lie this: setwd(Documents and Settings\\Beni User\\Desktop\\Transfer\\Model Fournier\\admb models\\simulated data\\stochastic\\pooled age classes) system(./stocpool) - makes R crash right away Does anyone know why this happens and how I can make this work? Thanks a lot for the help!! cheers Beni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read in data file into R
Hi I'm trying to read a data file with output from another program (admb) into R for further analysis. However I'm not very successfull. The file extension for the data file is file.rep but it also doesn't help when I change it to file.txt I have two problems/questions: 1. The file is a single line of n values separated by a single space tab each. These values represent a time series of length n. How can I make a numeric vector with this data? When I use the read.table command and read in the file R produces a list of as many objects as there are values (n). However what I need is a vector of length n in order to work with the data. When I try to coerce the list into a single vector using as.vector this doesn't work. When I specify sep=\n then I get a list where all the n values are treated as one value and I cant extract single values. 2. And related to the issue above: When I have a data file which consists of two objects, one is a matrix and the other one is a vector. Can I read the file into R all at once as a list with 2 objects and then extract the matrix and the vector and work with them? Or is it necessary to first make two files, for each object one? Below I have copied a subset of my data files for ilustration. This seems a very silly question but I just didn't manage to to it. Thanks a lot for the help cheers beni This is a subset of my data file for 1.: Time series of reconstructed populations 3709.17 2660.93 2045.36 2090.33 2096.93 2205.65 2083.72 1797.53 1884.61 1946.59 2101.66 2220.03 2080.04 2097.07 2332.9 2325.47 2091.67 2091.54 2072.38 2025.31 1919.54 1781.95 1867.96 1685.12 1826.31 1654.25 1593.84 1430.96 1539.89 1587.35 1472.32 1737.02 1510.37 1570.15 1723.21 1755.3 1843.85 1829.2 1880.63 1916.79 1945.86 2096.64 2246.67 2101.16 2134.39 2018.1 2174.04 This is a subset of the data file for 2.: Reconstructed population 203.026 200.005 205.206 217.36 279.415 750.965 495.041 91.3615 162.004 147.748 156.499 492.444 463.284 222.768 74.0028 116.643 106.379 303.677 468.042 208.478 180.442 53.282 83.9828 194.375 460.216 210.619 168.867 129.918 38.3631 135.857 461.88 207.097 170.601 121.584 93.5413 80.3142 474.857 207.846 167.749 122.833 87.5406 98.5 479.117 213.686 168.355 120.779 88.4396 101.233 480.269 215.603 173.085 121.216 86.961 102.94 483.206 216.121 174.638 124.622 87.2753 102.538 486.657 217.443 175.058 125.739 89.7275 102.608 490.516 218.996 176.128 126.042 90.5324 104.401 494.019 220.732 177.386 126.813 90.7501 105.676 497.345 222.308 178.793 127.718 91.305 106.327 500.797 223.805 180.07 128.731 91.9571 106.979 504.331 225.359 181.282 129.65 92.6863 107.701 507.892 226.949 182.54 130.523 93.3482 108.507 511.458 228.551 183.829 131.429 93.9768 109.296 515.039 230.156 185.127 132.357 94.629 110.054 518.65 231.767 186.426 133.291 95.2967 110.818 522.291 233.393 187.732 134.227 95.9696 111.595 525.957 235.031 189.048 135.167 96.6434 112.381 529.648 236.681 190.375 136.115 97.32 113.171 533.364 238.342 191.711 137.07 98.0025 113.964 537.106 240.014 193.057 138.032 98.6905 114.763 541.61 241.698 194.411 139.001 99.3832 115.569 545.435 243.725 195.775 139.976 100.081 116.38 549.312 245.446 197.417 140.958 100.783 117.197 553.294 247.19 198.811 142.14 101.49 118.019 557.349 248.982 200.224 143.144 102.341 118.847 time series of the reconstructed population 1855.98 1545.1 1286.75 1188.6 1143.84 1135.02 1159.33 1171.61 1180.07 1188.4 1197.23 1206.61 1215.38 1223.8 1232.34 1241.01 1249.76 1258.54 1267.36 1276.25 1285.21 1294.23 1303.31 1312.45 1321.66 1331.67 1341.37 1351.11 1360.94 1370.89 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optimize a joint lieklihood with mle2
Hi I'm trying to maximize a joint likelihood of 2 likelihoods (Likelihood 1 and Likelihood 2) in mle2, where the parameters I estimate in Likelihood 2 go into the likelihood 1. In Likelihood 1 I estimate the vector logN with length 37, and for the Likelihood 2 I measure a vector s of length 8. The values of s in Lieklihood 2 are used in the Likelihood 1. I have 2 questions: ##1 I manage to write the joint lieklihood (it's very long which is why I don't put it here), but I don't know how to write the correct argument for the mle2 function. Here is what I've tried: ## the lieklihood function to be optimized is called mfun(logN, s, h=h, cohort=cohort). I want to maximize the vectors logN and s. I define the starting values for logN as svec1 and for s as svec2 then I write for the optimization: m1 = mle2(mfun,start=list(svec1=logN,svec2=s), method=L-BFGS-B,lower=lower,upper=10, vecpar=TRUE, data=list(h=h,cohort=cohort)) but I get the error: Error in mle2(mfun, start = list(svec1 = logN, svec2 = s), method = L-BFGS-B, : some named arguments in 'start' are not arguments to the specified log-likelihood function How can I correctly write the fucntion for this? # ##2 Besides I'm using constrained optim method L-BFGS-B. But I would only like to use it for logN and not for s. Is that at all possible? I guess I could use the same limits also for s because s falls in the range of logN values, but how can I tell mle2 that I measure 2 likelihoods. thanks so much for the help Benedikt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] likelihood profiles for multiple parameters
Hi I'm using mle2 for optimization of a multinomial likelihood. The model fit is not very good and I would like to look at the likelihood profiles. However I'm optimizing many parameters (~40) so using the function profile() takes forever and is not practical. How can I calculate approximate likelihood profiles? Is there a function for this in R? thanks a lot for the help Beni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using near-zero probabilities in optimization
[row(p) == col(p) - i] - odiag(h,i)*c(1,cumprod((1-odiag(h[1:(R-1),1:(L-1)],i)) *odiag(S[1:(R-1),],i))) ay[index]-1-sum(odiag(p[1:R,],i)) index-index+1 } p-rbind(p,ay[1:L]) p-cbind(p,ay[length(ay):(L+1)]) return(p) } predprob(S,h) #test the function ##Function to calculate the multinomial -from Ben Bolker ## multinom WITHOUT ROUNDING without error-checking dmnom - function(x,prob,log=FALSE) { r - lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1)) if (log) r else exp(r) } ### ## function to be optimized ### mfun - function(logN,S=S,h=h,cohort=cohort) { L-ncol(cohort) R-nrow(cohort) d-matrix(rep(NA,(R+1)*(L+1)),ncol=L+1) # last row and last col of the matrix d[1:R,1:L]-log(cohort)# are the values to be optimized-Log tranformed data! index-1 index1-(R+L) for (i in -(R-1):(L-1)){# values for the top row if (i= -(R-L) ){ d[(R+1),(index+1)]-log(exp(logN[index])-sum(odiag(cohort,i)))} else if (i -(R-L) i1 ){ d[index1,(L+1)]-log(exp(logN[index])-sum(odiag(cohort,i)))} else if (i0){ d[index1,(L+1)]-log(exp(logN[index])-sum(odiag(cohort,i)))} index-index+1 index1-index1-1 } pp-predprob(S,h) lhood-0 # The Likelihood function - to be maximized for (i in -(R-1):(L-1)){ lhood-lhood+(-dmnom(exp(odiag(d,i)),prob=odiag(pp,i),log=TRUE)) } return(lhood) } logN-log(rep(1000,(R+L-1))) # Test the function with test-values mfun(logN,S,h,cohort) ## ## ## ##Starting values for the optimization ## N-rep(500,(R+L-1)) # optimization,-N-x1-x2-x3 svec = log(N) # transform to avoid constraints (x0) names(svec) - paste(logN,1:(R+L-1),sep=) parnames(mfun) - names(svec) ## ##Lower bounds for the L-BFGS-B-method ## index-1 lower-rep(0,(R+L-1)) for (i in -(R-1):(L-1)){ lower[index]-log(sum(odiag(cohort,i))) index-index+1 } lower-lower+0.001 names(lower)-names(svec) exp(lower) # check lower bounds # ##Optimization with mle2 # ## use bounded optimization m1 = mle2(mfun,start=svec, method=L-BFGS-B,lower=lower,upper=10, # lower=must be larger than sum(odiag(x))-if smaller NANs are produced vecpar=TRUE, data=list(S=S,h=h,cohort=cohort)) coef(m1) Nest-exp(coef(m1)) # estimated cohort sizes for row and colum 1 # ##Analyse the Goodness of fit # Nya-matrix(rep(0,length(cohort)),ncol=L) # project the N estimates into a matrix Nya[1,1:L]-Nest[R:length(Nest)] Nya[2:R,1]-rev(Nest[1:(R-1)]) stot-(1-Q[1:(a-1)])*P[2:a]# derived survival - product of (1-h(i)) amd s(i) smat-matrix(rep(stot,R),ncol=L-1,byrow=T) for (i in 1:ncol(smat)){# project the N estimates trough time Nya[2:R,i+1]-Nya[1:(R-1),i]*smat[1:(R-1),i] } ### ##Expected harvest values compared with observed harvest numbers ### exp-Nya*h obs-cohort gof-((obs-exp)^2)/exp # Goodness of fit for the single estimates # Overall GOF statistic gof.stat-sum(gof) # GOF statistic p.value-1-pchisq(gof.stat,(length(obs)-1)) # p-value with appropriate degrees of freedom p.value plot(obs~exp,main=Observed harvest numbers vs expected from the model,xlim=c(0,200),ylim=c(0,200)) abline(0,1) ## ##Plots ## ##Plot true population trend - compare with estimated trend plot(rowSums(N.true)~seq(1:y),ylim=c(0,5000),col=red,pch=17,main=True vs estimated population trend) lines(rowSums(N.true)~seq(1:y),col=red) points(rowSums(Nya)~seq(1:y),ylim=c(0,5000),col=blue,pch=16) lines(rowSums(Nya)~seq(1:y),col=blue) ### ### ## ##END# ## Thanks a lot for the help Benedikt On Tue, 9 Mar 2010 15:34:09 + (UTC) Ben Bolker bol...@ufl.edu wrote: Benedikt Gehr benedikt.gehr at ieu.uzh.ch writes: Hi there I am using mle2
[R] using near-zero probabilities in optimization
Hi there I am using mle2 for a multinomial likelihood optimization problem. My function works fine when I'm using simulated data, however my cell probabilities of the true data for the multinomial likelihood are sometimes very small (in some cases 0.00...) and the estimated point estimates fit the true vlaues quite poorly. Is there a way how to handle near zero probabilities in maximum likelihood optimization? Thanks a lot for the help best wishes Benedikt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function odiag(): assigning values to off-diagonal
hi I'm trying to use the function odiag(x) for matrix calculations. I would like to assign new values to an off-diagonal in a matrix. When I use the diag (x) function I could write something like p-matrix(seq(1:12),ncol=4) p.new-matrix(rep(0,12),ncol=4) diag(p.new)-diag(p) p.new But this won't work with odiag. How can I turn odiag (x) into something like diag (x) in order to assign off-diagonals new values? In the end I would like to do something like this: h-matrix(rep(c(0.2,0.3,0.3,1),4),ncol=4,byrow=T) #harvest rates for the 3 diferent years S-matrix(rep(c(0.9,0.8,0.7),4),ncol=3,byrow=T) #survival rates for (i in 0:(L-2)){ odiag(p,i) - (odiag(h,i)*c(1,cumprod((1-odiag(h[1:R,1:(L-1)],i))*odiag(S[1:R,1:(L-1)],i p[Rp-i,Lp] - 1-sum(odiag(p[1:Rp,-Lp],i)) ## add one more value p[1,L]-h[1,L] p[2,Lp]-1-p[1,L] } Does anybody know how I can do this? Thank you very much for the help Best wishes Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using mle2 for multinomial model optimization
Hi there I'm trying to find the mle fo a multinomial model -*L(N,h,S¦x)*. There is only *N* I want to estimate, which is used in the number of successes for the last cell probability. These successes are given by: p^(N-x1-x2-...xi) All the other parameters (i.e. h and S) I know from somewhere else. Here is what I've tried to do so far for a imaginary data set: ### cohort-c(50,54,50) #imaginary harvest numbers of a cohort harvested over 3 years l-length(cohort)+1#nr of cell probabilities h-c(0.2,0.3,1) #harvest rates for the 3 diferent years S-c(0.9,0.8) #survival rates mfun - function(d) { S-S#survival rate h-h#harvest rate l-length(d) p-rep(0,l) #Cell probabilities p[1]-h[1] prod-(1-h[1])*S[1] for (i in 2:(l-1)){ p[i]-prod*h[i] prod-prod*(1-h[i])*S[i] } p[l]-1-sum(p[-l])#last cell probability -dmultinom(exp(d),prob=p,log=TRUE)#exp(d)-backtransform the estimates } c-c(cohort,100)#untransformed initial values for the optimization,-100=N-x1-x2-x3 nvec-c(rep(x,l-1),N) #names for the initial vector nrs-c(1:(l-1),1)#nrs for the initial vector svec = log(c) #transformation of the data to avoid constraints (x0) names(svec) - paste(nvec,nrs,sep=) parnames(mfun) - names(svec) m1 = mle2(mfun,start=svec, vecpar=TRUE,fixed=svec[1:(l-1)]) #only estimate N-x1-x2-x3,everything else is fixed coef(m1) The function mfun is working, however the mle2 won't work. How can I fix this? Thanks so much for your help Beni __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compute maximum likelihood estimator for a multinomial function
Hi there I am trying to learn how to compute mle in R for a multinomial negative log likelihood function. I am using for this the book by B. Bolker Ecological models and data in R, chapter 6: Likelihood an all that. But he has no example for multinomial functions. What I did is the following: I first defined a function for the negative log likelihood: ### log.multi-function(d,p){ -sum(dmultinom(d,prob=p,log=T)) } ##then defined the parameters d-data ## (in my case counts of animals found dead in different age-classes 1:19) p-d/sum(d) ##or some values close to the analytical solution n(i)/N for the probabilities ##then I used the optim function like this: opt1-optim(par=p,fn=log.multi,d=d) # This works perfectly when I use some simple imaginary numbers instead of my real data; e.g. d-c(3,6,9) However for my real data this doesnt work because the differences between the different counts seem to be to large. here are my data: ### d-count d [1] 7 15 9 3 6 13 11 17 15 36 38 52 49 53 33 20 6 2 1 ### I get the error message: probabilities can't neither be negative nor all 0 So I wanted to try the mle or the mle2 functions of the packages stats4 and bbmle respectively. But I just can't get them running. I tried it like this: ### library(bbmle) log.multi-function(d,p.est){ -sum(dmultinom(x=d,prob=p.est,log=T)) } d-c(3,5,7) p.est-d/sum(d) m1-mle2(minuslogl=log.multi,start=list(p.est=p.est),data=list(N=sum(d),d=d)) ### I get the error: Error in dmultinom(x = d, prob = p.est, log = T) : x[] and prob[] must be equal length vectors. And no matter how I tried (and I tried for a long time!!) I cant't figure out what the problem is. I'm sure it a very silly mistake but I don't know what it is. I think it must have something to do with how I define the parameters. Could anyone help me with this? Thank you so much in advance Benedikt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.