Re: [R] strucchange Fstats() example
On Thu, 31 May 2012, Mabille, Geraldine wrote: Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur. I think the code I need to use in order to test what I want is something of that type: gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass) The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below? gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass) gmass2 will be appropriate if under the null hypothesis there is a constant probability of survival and under the alternative there are different groups each of which has a constant probability of survival. gmass will be appropriate if under the null hypothesis the logit of the probability of survival increases/decreases linearly with mass - and under the alternative there are different groups each of which has a probability that increases/decreases with mass. From your description above I suspect that gmass2 is what you are looking for. The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), In the gmass2 model, there is a single parameter (probability of survival) which may or may not change across mass. In the gmass model, there are two parameters (intercept and slope) which may or may not vary across mass. and this parameter is also the parameter we use as the order.by= parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the whole point around questions 1 and 2 is that I don't understand the interpretation of the intercept in the gmass model??? Third question: how to choose the proper functional? Always difficult because there are no strong optimality results. If you test just a single parameter (i.e., gmass2), then I would expect that most functionals should lead to similar performance. I would try plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = meanL2BB) plot(gmass2, functional = supLM(0.1)) where the latter two would probably have somewhat higher power. But the rangeBB might also work rather well here... I have seen that you discuss that in your CSDA(2006 2003) papers and, in the 2006 paper you say: in situations where there is a shift in the parameters and then a second shift back, it is advantageous to aggregate over time using the range and . Which means, if I understand well that rangeBB would be adapted to the kind of test I want to perform. However, since I want to determine the timing of the peaks, I need my functional to produce a time series plot, for example like meanL2BB does. Do you think I can use meanL2BB functional in my case or should I compute an home-made functional which would use the range of efp but with comp applied first and time after (is this possible???). When you test only a single parameter, then you don't need to aggregate across the components anyway because there is just a single one. Fourth question: is it OK to only make a visual estimation of the breakpoints from the peaks seen on the graph after plotting the efp or should I use the breakpoints() function to properly date the breakpoints??? I'm not sure this breakpoints() function can be applied to binary data? In the fxregime package there is an unexported gbreakpoints() function that can be used. For example, if your data is in a data.frame d: gbp - fxregime:::gbreakpoints(Success ~ 1, data = d, order.by = d$Mass, fit = glm, family = binomial, h = 20, ic = BIC) plot(gbp) breakpoints(gbp) Note that the code is not optimized for glm and hence can be rather slow - extremely slow if you have many observations (more than a few hundred). How many observations do you have? Fifth question: I have noticed that the p-values I obtain after performing the sctest(gmass, functional=meanL2BB) for example are a bit different depending on if I introduce family=binomial as an argument in my gefp() call. Should I use this argument or is it used by default when you specify fit=glm??? If you want to fit a binomial model yes. Otherwise a linear regression is used. (The 'family' is simply passed on to glm.) Last question, you said in your previous message that I could look at the maxstat_test() from package coin for an interesting nonparametric alternative but I think this package does not allow
Re: [R] strucchange Fstats() example
Hi and thanks again for your answer. I have just a last question regarding the choice of the functional...if you have time to help on that again. I have tried running the sctest using the functionals you recommended sctest(gmass2,functional=meanL2BB) sctest(gmass2,functional=rangeBB) sctest(gmass2,functional=supLM(0.1)) and I have a question regarding the meanL2BB test. The P-value obtained for this test is P=0.18 and the plot obtained is given as an attachment to the message. What I don't understand is why the overall test for meanL2BB is not significant while we see on the graph at least two peaks above the red line? The other two tests (rangeBB and supLM) are both significant at P=0.03. Thanks again for your precious help, Geraldine -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: 1. juni 2012 08:56 To: Mabille, Geraldine Cc: R-help@r-project.org Subject: RE: [R] strucchange Fstats() example On Thu, 31 May 2012, Mabille, Geraldine wrote: Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur. I think the code I need to use in order to test what I want is something of that type: gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass) The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below? gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass) gmass2 will be appropriate if under the null hypothesis there is a constant probability of survival and under the alternative there are different groups each of which has a constant probability of survival. gmass will be appropriate if under the null hypothesis the logit of the probability of survival increases/decreases linearly with mass - and under the alternative there are different groups each of which has a probability that increases/decreases with mass. From your description above I suspect that gmass2 is what you are looking for. The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), In the gmass2 model, there is a single parameter (probability of survival) which may or may not change across mass. In the gmass model, there are two parameters (intercept and slope) which may or may not vary across mass. and this parameter is also the parameter we use as the order.by= parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the whole point around questions 1 and 2 is that I don't understand the interpretation of the intercept in the gmass model??? Third question: how to choose the proper functional? Always difficult because there are no strong optimality results. If you test just a single parameter (i.e., gmass2), then I would expect that most functionals should lead to similar performance. I would try plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = meanL2BB) plot(gmass2, functional = supLM(0.1)) where the latter two would probably have somewhat higher power. But the rangeBB might also work rather well here... I have seen that you discuss that in your CSDA(2006 2003) papers and, in the 2006 paper you say: in situations where there is a shift in the parameters and then a second shift back, it is advantageous to aggregate over time using the range and . Which means, if I understand well that rangeBB would be adapted to the kind of test I want to perform. However, since I want to determine the timing of the peaks, I need my functional to produce a time series plot, for example like meanL2BB does. Do you think I can use meanL2BB functional in my case or should I compute an home-made functional which would use the range of efp but with comp applied first and time after (is this possible???). When you test only a single parameter, then you don't need to aggregate across the components anyway because there is just a single one. Fourth question: is it OK to only make a visual estimation of the breakpoints from the peaks seen on the graph after plotting the efp or should I use the breakpoints() function to properly date the breakpoints??? I'm not sure this breakpoints() function can be applied to binary data? In the fxregime package there is an unexported gbreakpoints() function that can be used. For example
Re: [R] strucchange Fstats() example
On Fri, 1 Jun 2012, Mabille, Geraldine wrote: Hi and thanks again for your answer. I have just a last question regarding the choice of the functional...if you have time to help on that again. I have tried running the sctest using the functionals you recommended sctest(gmass2,functional=meanL2BB) sctest(gmass2,functional=rangeBB) sctest(gmass2,functional=supLM(0.1)) and I have a question regarding the meanL2BB test. The P-value obtained for this test is P=0.18 and the plot obtained is given as an attachment to the message. What I don't understand is why the overall test for meanL2BB is not significant while we see on the graph at least two peaks above the red line? This is discussed at the end of Section 5.1 (middle of page 3001) in the 2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ of the process, not the _maximum_. Hence, the mean (visualized by the dashed horizontal line) needs to exceed the critical value (red horizontal line) to obtain a significant result. Best, Z The other two tests (rangeBB and supLM) are both significant at P=0.03. Thanks again for your precious help, Geraldine -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: 1. juni 2012 08:56 To: Mabille, Geraldine Cc: R-help@r-project.org Subject: RE: [R] strucchange Fstats() example On Thu, 31 May 2012, Mabille, Geraldine wrote: Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur. I think the code I need to use in order to test what I want is something of that type: gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass) The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below? gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass) gmass2 will be appropriate if under the null hypothesis there is a constant probability of survival and under the alternative there are different groups each of which has a constant probability of survival. gmass will be appropriate if under the null hypothesis the logit of the probability of survival increases/decreases linearly with mass - and under the alternative there are different groups each of which has a probability that increases/decreases with mass. From your description above I suspect that gmass2 is what you are looking for. The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), In the gmass2 model, there is a single parameter (probability of survival) which may or may not change across mass. In the gmass model, there are two parameters (intercept and slope) which may or may not vary across mass. and this parameter is also the parameter we use as the order.by= parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the whole point around questions 1 and 2 is that I don't understand the interpretation of the intercept in the gmass model??? Third question: how to choose the proper functional? Always difficult because there are no strong optimality results. If you test just a single parameter (i.e., gmass2), then I would expect that most functionals should lead to similar performance. I would try plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = meanL2BB) plot(gmass2, functional = supLM(0.1)) where the latter two would probably have somewhat higher power. But the rangeBB might also work rather well here... I have seen that you discuss that in your CSDA(2006 2003) papers and, in the 2006 paper you say: in situations where there is a shift in the parameters and then a second shift back, it is advantageous to aggregate over time using the range and . Which means, if I understand well that rangeBB would be adapted to the kind of test I want to perform. However, since I want to determine the timing of the peaks, I need my functional to produce a time series plot, for example like meanL2BB does. Do you think I can use meanL2BB functional in my case or should I compute an home-made functional which would use the range of efp but with comp applied first and time after (is this possible???). When you test only a single parameter, then you don't need to aggregate across the components anyway because there is just a single one. Fourth question: is it OK to only make a visual estimation
Re: [R] strucchange Fstats() example
Thanks! I had seen this section but hadn't understood that the dashed line represented the mean. Does that mean in my case (two tests hardly significant P=0.03 and one NS) that I should reject the hypothesis of different levels of survival probability for different groups of females?? -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: 1. juni 2012 12:49 To: Mabille, Geraldine Cc: R-help@r-project.org Subject: RE: [R] strucchange Fstats() example On Fri, 1 Jun 2012, Mabille, Geraldine wrote: Hi and thanks again for your answer. I have just a last question regarding the choice of the functional...if you have time to help on that again. I have tried running the sctest using the functionals you recommended sctest(gmass2,functional=meanL2BB) sctest(gmass2,functional=rangeBB) sctest(gmass2,functional=supLM(0.1)) and I have a question regarding the meanL2BB test. The P-value obtained for this test is P=0.18 and the plot obtained is given as an attachment to the message. What I don't understand is why the overall test for meanL2BB is not significant while we see on the graph at least two peaks above the red line? This is discussed at the end of Section 5.1 (middle of page 3001) in the 2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ of the process, not the _maximum_. Hence, the mean (visualized by the dashed horizontal line) needs to exceed the critical value (red horizontal line) to obtain a significant result. Best, Z The other two tests (rangeBB and supLM) are both significant at P=0.03. Thanks again for your precious help, Geraldine -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: 1. juni 2012 08:56 To: Mabille, Geraldine Cc: R-help@r-project.org Subject: RE: [R] strucchange Fstats() example On Thu, 31 May 2012, Mabille, Geraldine wrote: Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur. I think the code I need to use in order to test what I want is something of that type: gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass) The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below? gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass) gmass2 will be appropriate if under the null hypothesis there is a constant probability of survival and under the alternative there are different groups each of which has a constant probability of survival. gmass will be appropriate if under the null hypothesis the logit of the probability of survival increases/decreases linearly with mass - and under the alternative there are different groups each of which has a probability that increases/decreases with mass. From your description above I suspect that gmass2 is what you are looking for. The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), In the gmass2 model, there is a single parameter (probability of survival) which may or may not change across mass. In the gmass model, there are two parameters (intercept and slope) which may or may not vary across mass. and this parameter is also the parameter we use as the order.by= parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the whole point around questions 1 and 2 is that I don't understand the interpretation of the intercept in the gmass model??? Third question: how to choose the proper functional? Always difficult because there are no strong optimality results. If you test just a single parameter (i.e., gmass2), then I would expect that most functionals should lead to similar performance. I would try plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = meanL2BB) plot(gmass2, functional = supLM(0.1)) where the latter two would probably have somewhat higher power. But the rangeBB might also work rather well here... I have seen that you discuss that in your CSDA(2006 2003) papers and, in the 2006 paper you say: in situations where there is a shift in the parameters and then a second shift back, it is advantageous to aggregate over time using the range and . Which means, if I understand well that rangeBB would be adapted to the kind of test I want
Re: [R] strucchange Fstats() example
On Fri, 1 Jun 2012, Mabille, Geraldine wrote: Thanks! I had seen this section but hadn't understood that the dashed line represented the mean. Does that mean in my case (two tests hardly significant P=0.03 and one NS) that I should reject the hypothesis of different levels of survival probability for different groups of females?? Hard to say with the information given. The evidence does not seem to be overwhelming but given that the tests are asymptotic in nature it might be better to employ a nonparametric test. But this now goes beyond the R-help focus, I guess. If you contact me off-list, I can have a look if you're interested. Best, Z -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: 1. juni 2012 12:49 To: Mabille, Geraldine Cc: R-help@r-project.org Subject: RE: [R] strucchange Fstats() example On Fri, 1 Jun 2012, Mabille, Geraldine wrote: Hi and thanks again for your answer. I have just a last question regarding the choice of the functional...if you have time to help on that again. I have tried running the sctest using the functionals you recommended sctest(gmass2,functional=meanL2BB) sctest(gmass2,functional=rangeBB) sctest(gmass2,functional=supLM(0.1)) and I have a question regarding the meanL2BB test. The P-value obtained for this test is P=0.18 and the plot obtained is given as an attachment to the message. What I don't understand is why the overall test for meanL2BB is not significant while we see on the graph at least two peaks above the red line? This is discussed at the end of Section 5.1 (middle of page 3001) in the 2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ of the process, not the _maximum_. Hence, the mean (visualized by the dashed horizontal line) needs to exceed the critical value (red horizontal line) to obtain a significant result. Best, Z The other two tests (rangeBB and supLM) are both significant at P=0.03. Thanks again for your precious help, Geraldine -Original Message- From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] Sent: 1. juni 2012 08:56 To: Mabille, Geraldine Cc: R-help@r-project.org Subject: RE: [R] strucchange Fstats() example On Thu, 31 May 2012, Mabille, Geraldine wrote: Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur. I think the code I need to use in order to test what I want is something of that type: gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass) The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below? gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass) gmass2 will be appropriate if under the null hypothesis there is a constant probability of survival and under the alternative there are different groups each of which has a constant probability of survival. gmass will be appropriate if under the null hypothesis the logit of the probability of survival increases/decreases linearly with mass - and under the alternative there are different groups each of which has a probability that increases/decreases with mass. From your description above I suspect that gmass2 is what you are looking for. The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), In the gmass2 model, there is a single parameter (probability of survival) which may or may not change across mass. In the gmass model, there are two parameters (intercept and slope) which may or may not vary across mass. and this parameter is also the parameter we use as the order.by= parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the whole point around questions 1 and 2 is that I don't understand the interpretation of the intercept in the gmass model??? Third question: how to choose the proper functional? Always difficult because there are no strong optimality results. If you test just a single parameter (i.e., gmass2), then I would expect that most functionals should lead to similar performance. I would try plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = meanL2BB) plot(gmass2, functional = supLM(0.1)) where the latter two would probably have somewhat higher power. But the rangeBB might also work rather well here... I have seen that you discuss that in your CSDA(2006 2003
Re: [R] strucchange Fstats() example
Thanks a lot for your answer Achim, this helped a lot. I have done a lot of reading, following your recommendations and I think I have a better idea of what I should use. My dataset contains binary data on survival of the calf depending on the mass of the mother. We know that the probability of survival of the calf should vary according to the mass of the mother: 3 groups of mass expected, lower survival of the calves for small and large females, best survival of calf for intermediate-sized females. I want to identify at which masses those changes in survival occur. I think the code I need to use in order to test what I want is something of that type: gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass) The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below? gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass) The second question, which is related I think to the first one is whether it makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a single parameter in the model (Mass), and this parameter is also the parameter we use as the order.by= parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the whole point around questions 1 and 2 is that I don't understand the interpretation of the intercept in the gmass model??? Third question: how to choose the proper functional? I have seen that you discuss that in your CSDA(2006 2003) papers and, in the 2006 paper you say: in situations where there is a shift in the parameters and then a second shift back, it is advantageous to aggregate over time using the range and . Which means, if I understand well that rangeBB would be adapted to the kind of test I want to perform. However, since I want to determine the timing of the peaks, I need my functional to produce a time series plot, for example like meanL2BB does. Do you think I can use meanL2BB functional in my case or should I compute an home-made functional which would use the range of efp but with comp applied first and time after (is this possible???). Fourth question: is it OK to only make a visual estimation of the breakpoints from the peaks seen on the graph after plotting the efp or should I use the breakpoints() function to properly date the breakpoints??? I'm not sure this breakpoints() function can be applied to binary data? Fifth question: I have noticed that the p-values I obtain after performing the sctest(gmass, functional=meanL2BB) for example are a bit different depending on if I introduce family=binomial as an argument in my gefp() call. Should I use this argument or is it used by default when you specify fit=glm??? Last question, you said in your previous message that I could look at the maxstat_test() from package coin for an interesting nonparametric alternative but I think this package does not allow estimation of more than one breakpoint??? Thanks heaps if you can help again with those issues, Best, Geraldine -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Achim Zeileis Sent: 30. mai 2012 08:23 To: R-help@r-project.org Subject: Re: [R] strucchange Fstats() example On Tue, 29 May 2012, Mabille, Geraldine wrote: snip In the second example, the authors state the presence of at least two breakpoints. When plotting the F-statistics using the following code, we see indeed two peaks in the F-statistics, that coincides with the dates given by the authors: c.a 1973 and 1983 but when trying to add those breakpoints to the time series, only one is taken into account The breakpoints() method for Fstats objects can just extract a single breakpoint. The reason is that maximizing the F statistics is equivalent to minimizing the residual sum of squares of a model with a single breakpoint. If you want to estimate more than a single breakpoint, you need to minimize the corresponding segmented sums of squares. This can be done with the formula method of breakpoints(), see ?breakpoints. More specificially: In your example with breakpoints(fs, breaks = 2), the breaks argument is simply ignored. The method just does not have a breaks argument and it goes through ... We see that even though the F-statistics seem to show the existence of 2 breakpoints, only one is detected by the breakpoints() function. Does anyone know how this is possible? I'm totally new to strucchange so it might well be something obvious I'm missing here! Please have a closer look at the package's documentation and the corresponding papers. See citation(strucchange) for the most important references and the corresponding manual pages for more details. For the breakpoints issue you should probably start reading the CSDA paper. OTHER SIDE QUESTION: can strucchange be used if the y variable is binary??? Testing for breakpoints can be done
Re: [R] strucchange Fstats() example
On Tue, 29 May 2012, Mabille, Geraldine wrote: snip In the second example, the authors state the presence of at least two breakpoints. When plotting the F-statistics using the following code, we see indeed two peaks in the F-statistics, that coincides with the dates given by the authors: c.a 1973 and 1983 but when trying to add those breakpoints to the time series, only one is taken into account The breakpoints() method for Fstats objects can just extract a single breakpoint. The reason is that maximizing the F statistics is equivalent to minimizing the residual sum of squares of a model with a single breakpoint. If you want to estimate more than a single breakpoint, you need to minimize the corresponding segmented sums of squares. This can be done with the formula method of breakpoints(), see ?breakpoints. More specificially: In your example with breakpoints(fs, breaks = 2), the breaks argument is simply ignored. The method just does not have a breaks argument and it goes through ... We see that even though the F-statistics seem to show the existence of 2 breakpoints, only one is detected by the breakpoints() function. Does anyone know how this is possible? I'm totally new to strucchange so it might well be something obvious I'm missing here! Please have a closer look at the package's documentation and the corresponding papers. See citation(strucchange) for the most important references and the corresponding manual pages for more details. For the breakpoints issue you should probably start reading the CSDA paper. OTHER SIDE QUESTION: can strucchange be used if the y variable is binary??? Testing for breakpoints can be done with the function gefp(). See its manual pages for references and details. The manual page just has a Poisson GLM example but the corresponding papers (in Stat Neerl and CSDA) also have binary response examples. If you have a binary response and just want to test whether the proportion of successes changes across time (or some other variable of interest), then maxstat_test() from package coin might be an interesting nonparametric alternative. hth, Z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strucchange Fstats() example
Dear all, I'm trying to understand how the strucchange package is working and I have been looking at the examples given for the Fstats() function. The first example (Nile), shows one peak in the F-stats and one breakpoint is estimated, that can be plotted using the following code ## Nile data with one breakpoint: the annual flows drop in 1898 ## because the first Ashwan dam was built data(Nile) plot(Nile) ## test the null hypothesis that the annual flow remains constant ## over the years fs.nile - Fstats(Nile ~ 1) plot(fs.nile, alpha=0.05) sctest(fs.nile) ## visualize the breakpoint implied by the argmax of the F statistics plot(Nile) lines(breakpoints(fs.nile)) In the second example, the authors state the presence of at least two breakpoints. When plotting the F-statistics using the following code, we see indeed two peaks in the F-statistics, that coincides with the dates given by the authors: c.a 1973 and 1983 but when trying to add those breakpoints to the time series, only one is taken into account ## UK Seatbelt data: a SARIMA(1,0,0)(1,0,0)_12 model ## (fitted by OLS) is used and reveals (at least) two ## breakpoints - one in 1973 associated with the oil crisis and ## one in 1983 due to the introduction of compulsory ## wearing of seatbelts in the UK. data(UKDriverDeaths) seatbelt - log10(UKDriverDeaths) seatbelt - cbind(seatbelt, lag(seatbelt, k = -1), lag(seatbelt, k = -12)) colnames(seatbelt) - c(y, ylag1, ylag12) seatbelt - window(seatbelt, start = c(1970, 1), end = c(1984,12)) plot(seatbelt[,y], ylab = expression(log[10](casualties))) ## compute F statistics for potential breakpoints between ## 1971(6) (corresponds to from = 0.1) and 1983(6) (corresponds to ## to = 0.9 = 1 - from, the default) ## compute F statistics fs - Fstats(y ~ ylag1 + ylag12, data = seatbelt, from = 0.1) ## this gives the same result fs - Fstats(y ~ ylag1 + ylag12, data = seatbelt, from = c(1971, 6), to = c(1983, 6)) ## plot the F statistics plot(fs, alpha = 0.05) ###TRYING to plot the 2 breakpoints: lines(breakpoints(fs,breaks=2)) ...only one breakpoint is plotted. When looking at the results from breakpoints(fs,breaks=2) Optimal 2-segment partition: Call: breakpoints.Fstats(obj = fs, breaks = 2) Breakpoints at observation number: 46 Corresponding to breakdates: 1973(10) We see that even though the F-statistics seem to show the existence of 2 breakpoints, only one is detected by the breakpoints() function. Does anyone know how this is possible? I'm totally new to strucchange so it might well be something obvious I'm missing here! OTHER SIDE QUESTION: can strucchange be used if the y variable is binary??? Thanks heaps if anyone can help Geraldine From: Mabille, Geraldine Sent: 29. mai 2012 11:30 To: 'r-help@r-project.org' Subject: GAM interactions, by example Dear all, I'm using the mgcv library by Simon Wood to fit gam models with interactions and I have been reading (and running) the factor 'by' variable example given on the gam.models help page (see below, output from the two first models b, and b1). The example explains that both b and b1 fits are similar: note that the preceding fit (here b) is the same as (b1) I agree with the idea that it looks the same but when I look at the results from both models (summary b and summary b1) I see that the results look in fact quite different (edf, and also deviance explained for example???) Are those two models (b and b1) really testing the same things??? If yes, why are the results so different between models??? Thanks a lot if anyone can help with that... Geraldine dat - gamSim(4) ## fit model... b - gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat) plot(b,pages=1) summary(b) Family: gaussian Link function: identity Formula: y ~ fac + s(x2, by = fac) + s(x0) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.1784 0.1985 5.937 6.59e-09 *** fac2 -1.2148 0.2807 -4.329 1.92e-05 *** fac3 2.2012 0.2436 9.034 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df F p-value s(x2):fac1 5.364 6.472 2.285 0.0312 * s(x2):fac2 4.523 5.547 11.396 4.59e-11 *** s(x2):fac3 8.024 8.741 43.456 2e-16 *** s(x0) 1.000 1.000 0.237 0.6269 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.634 Deviance explained = 65.3% GCV score = 4.0288 Scale est. = 3.8082n = 400 ## note that the preceding fit is the same as b1-gam(y ~ s(x2,by=as.numeric(fac==1))+s(x2,by=as.numeric(fac==2))+ s(x2,by=as.numeric(fac==3))+s(x0)-1,data=dat) ## ... the `-1' is because the intercept is confounded with the ## *uncentred* smooths here. plot(b1,pages=1) summary(b1) Family: gaussian Link function: identity Formula: y ~ s(x2, by = as.numeric(fac == 1)) + s(x2, by = as.numeric(fac == 2)) + s(x2, by = as.numeric(fac == 3)) + s(x0) - 1 Approximate