Re: [R] strucchange Fstats() example

2012-06-01 Thread Achim Zeileis

On Thu, 31 May 2012, Mabille, Geraldine wrote:

Thanks a lot for your answer Achim, this helped a lot. I have done a lot 
of reading, following your recommendations and I think I have a better 
idea of what I should use. My dataset contains binary data on survival 
of the calf depending on the mass of the mother. We know that the 
probability of survival of the calf should vary according to the mass of 
the mother: 3 groups of mass expected, lower survival of the calves for 
small and large females, best survival of calf for intermediate-sized 
females. I want to identify at which masses those changes in survival 
occur.


I think the code I need to use in order to test what I want is something of 
that type:
gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass 
model shown up compared to the gmass2 model below?
gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)


gmass2 will be appropriate if under the null hypothesis there is a 
constant probability of survival and under the alternative there are 
different groups each of which has a constant probability of survival.


gmass will be appropriate if under the null hypothesis the logit of the 
probability of survival increases/decreases linearly with mass - and under 
the alternative there are different groups each of which has a probability 
that increases/decreases with mass.


From your description above I suspect that gmass2 is what you are looking 

for.

The second question, which is related I think to the first one is 
whether it makes sense to plot the gmass model as aggregate=FALSE, 
knowing that we have a single parameter in the model (Mass),


In the gmass2 model, there is a single parameter (probability of survival) 
which may or may not change across mass.


In the gmass model, there are two parameters (intercept and slope) which 
may or may not vary across mass.


and this parameter is also the parameter we use as the order.by= 
parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think the 
whole point around questions 1 and 2 is that I don't understand the 
interpretation of the intercept in the gmass model???


Third question: how to choose the proper functional?


Always difficult because there are no strong optimality results. If you 
test just a single parameter (i.e., gmass2), then I would expect that most 
functionals should lead to similar performance. I would try


plot(gmass2, functional = maxBB, aggregate = FALSE)
plot(gmass2, functional = meanL2BB)
plot(gmass2, functional = supLM(0.1))

where the latter two would probably have somewhat higher power. But the 
rangeBB might also work rather well here...


I have seen that you discuss that in your CSDA(2006  2003) papers and, 
in the 2006 paper you say: in situations where there is a shift in the 
parameters and then a second shift back, it is advantageous to aggregate 
over time using the range and . Which means, if I understand well 
that rangeBB would be adapted to the kind of test I want to perform. 
However, since I want to determine the timing of the peaks, I need my 
functional to produce a time series plot, for example like meanL2BB 
does. Do you think I can use meanL2BB functional in my case or should I 
compute an home-made functional which would use the range of efp but 
with comp applied first and time after (is this possible???).


When you test only a single parameter, then you don't need to aggregate 
across the components anyway because there is just a single one.


Fourth question: is it OK to only make a visual estimation of the 
breakpoints  from the peaks seen on the graph after plotting the efp 
or should I use the breakpoints() function to properly date the 
breakpoints??? I'm not sure this breakpoints() function can be applied 
to binary data?


In the fxregime package there is an unexported gbreakpoints() function 
that can be used. For example, if your data is in a data.frame d:


gbp - fxregime:::gbreakpoints(Success ~ 1, data = d,
  order.by = d$Mass, fit = glm, family = binomial, h = 20, ic = BIC)
plot(gbp)
breakpoints(gbp)

Note that the code is not optimized for glm and hence can be rather slow - 
extremely slow if you have many observations (more than a few hundred). 
How many observations do you have?


Fifth question: I have noticed that the p-values I obtain after 
performing the sctest(gmass, functional=meanL2BB) for example are a bit 
different depending on if I introduce family=binomial as an argument in 
my gefp() call. Should I use this argument or is it used by default when 
you specify fit=glm???


If you want to fit a binomial model yes. Otherwise a linear regression is 
used. (The 'family' is simply passed on to glm.)


Last question, you said in your previous message that I could look at 
the maxstat_test() from package coin for an interesting nonparametric 
alternative but I think this package does not allow 

Re: [R] strucchange Fstats() example

2012-06-01 Thread Mabille, Geraldine
Hi and thanks again for your  answer. I have just a last question regarding the 
choice of the functional...if you have time to help on that again.

I have tried running the sctest using the functionals you recommended 
sctest(gmass2,functional=meanL2BB)   
sctest(gmass2,functional=rangeBB) 
sctest(gmass2,functional=supLM(0.1))

and I have a question regarding the meanL2BB test. The P-value obtained for 
this test is P=0.18 and the plot obtained is given as an attachment to the 
message. What I don't understand is why the overall test for meanL2BB is not 
significant while we see on the graph at least two peaks above the red line? 
The other two tests (rangeBB and supLM) are both significant at P=0.03.

Thanks again for your precious help,
Geraldine

-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] 
Sent: 1. juni 2012 08:56
To: Mabille, Geraldine
Cc: R-help@r-project.org
Subject: RE: [R] strucchange Fstats() example

On Thu, 31 May 2012, Mabille, Geraldine wrote:

 Thanks a lot for your answer Achim, this helped a lot. I have done a 
 lot of reading, following your recommendations and I think I have a 
 better idea of what I should use. My dataset contains binary data on 
 survival of the calf depending on the mass of the mother. We know that 
 the probability of survival of the calf should vary according to the 
 mass of the mother: 3 groups of mass expected, lower survival of the 
 calves for small and large females, best survival of calf for 
 intermediate-sized females. I want to identify at which masses those 
 changes in survival occur.

 I think the code I need to use in order to test what I want is something of 
 that type:
 gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

 The first question I have is what is the difference between testing the gmass 
 model shown up compared to the gmass2 model below?
 gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)

gmass2 will be appropriate if under the null hypothesis there is a constant 
probability of survival and under the alternative there are different groups 
each of which has a constant probability of survival.

gmass will be appropriate if under the null hypothesis the logit of the 
probability of survival increases/decreases linearly with mass - and under the 
alternative there are different groups each of which has a probability that 
increases/decreases with mass.

From your description above I suspect that gmass2 is what you are looking for.

 The second question, which is related I think to the first one is 
 whether it makes sense to plot the gmass model as aggregate=FALSE, 
 knowing that we have a single parameter in the model (Mass),

In the gmass2 model, there is a single parameter (probability of survival) 
which may or may not change across mass.

In the gmass model, there are two parameters (intercept and slope) which may or 
may not vary across mass.

 and this parameter is also the parameter we use as the order.by= 
 parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think 
 the whole point around questions 1 and 2 is that I don't understand 
 the interpretation of the intercept in the gmass model???

 Third question: how to choose the proper functional?

Always difficult because there are no strong optimality results. If you test 
just a single parameter (i.e., gmass2), then I would expect that most 
functionals should lead to similar performance. I would try

plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = 
meanL2BB) plot(gmass2, functional = supLM(0.1))

where the latter two would probably have somewhat higher power. But the rangeBB 
might also work rather well here...

 I have seen that you discuss that in your CSDA(2006  2003) papers 
 and, in the 2006 paper you say: in situations where there is a shift 
 in the parameters and then a second shift back, it is advantageous to 
 aggregate over time using the range and . Which means, if I 
 understand well that rangeBB would be adapted to the kind of test I want to 
 perform.
 However, since I want to determine the timing of the peaks, I need my 
 functional to produce a time series plot, for example like meanL2BB 
 does. Do you think I can use meanL2BB functional in my case or should 
 I compute an home-made functional which would use the range of efp 
 but with comp applied first and time after (is this possible???).

When you test only a single parameter, then you don't need to aggregate across 
the components anyway because there is just a single one.

 Fourth question: is it OK to only make a visual estimation of the 
 breakpoints  from the peaks seen on the graph after plotting the efp 
 or should I use the breakpoints() function to properly date the 
 breakpoints??? I'm not sure this breakpoints() function can be applied 
 to binary data?

In the fxregime package there is an unexported gbreakpoints() function that 
can be used. For example

Re: [R] strucchange Fstats() example

2012-06-01 Thread Achim Zeileis

On Fri, 1 Jun 2012, Mabille, Geraldine wrote:


Hi and thanks again for your  answer. I have just a last question regarding the 
choice of the functional...if you have time to help on that again.

I have tried running the sctest using the functionals you recommended
sctest(gmass2,functional=meanL2BB)
sctest(gmass2,functional=rangeBB)
sctest(gmass2,functional=supLM(0.1))

and I have a question regarding the meanL2BB test. The P-value obtained 
for this test is P=0.18 and the plot obtained is given as an attachment 
to the message. What I don't understand is why the overall test for 
meanL2BB is not significant while we see on the graph at least two peaks 
above the red line?


This is discussed at the end of Section 5.1 (middle of page 3001) in the 
2006 CSDA paper. For the meanL2BB functional the test statistic is the 
_mean_ of the process, not the _maximum_. Hence, the mean (visualized by 
the dashed horizontal line) needs to exceed the critical value (red 
horizontal line) to obtain a significant result.


Best,
Z

The other two tests (rangeBB and supLM) are both 
significant at P=0.03.


Thanks again for your precious help,
Geraldine

-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
Sent: 1. juni 2012 08:56
To: Mabille, Geraldine
Cc: R-help@r-project.org
Subject: RE: [R] strucchange Fstats() example

On Thu, 31 May 2012, Mabille, Geraldine wrote:


Thanks a lot for your answer Achim, this helped a lot. I have done a
lot of reading, following your recommendations and I think I have a
better idea of what I should use. My dataset contains binary data on
survival of the calf depending on the mass of the mother. We know that
the probability of survival of the calf should vary according to the
mass of the mother: 3 groups of mass expected, lower survival of the
calves for small and large females, best survival of calf for
intermediate-sized females. I want to identify at which masses those
changes in survival occur.

I think the code I need to use in order to test what I want is something of 
that type:
gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass 
model shown up compared to the gmass2 model below?
gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)


gmass2 will be appropriate if under the null hypothesis there is a constant 
probability of survival and under the alternative there are different groups 
each of which has a constant probability of survival.

gmass will be appropriate if under the null hypothesis the logit of the 
probability of survival increases/decreases linearly with mass - and under the 
alternative there are different groups each of which has a probability that 
increases/decreases with mass.

From your description above I suspect that gmass2 is what you are looking for.


The second question, which is related I think to the first one is
whether it makes sense to plot the gmass model as aggregate=FALSE,
knowing that we have a single parameter in the model (Mass),


In the gmass2 model, there is a single parameter (probability of survival) 
which may or may not change across mass.

In the gmass model, there are two parameters (intercept and slope) which may or 
may not vary across mass.


and this parameter is also the parameter we use as the order.by=
parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think
the whole point around questions 1 and 2 is that I don't understand
the interpretation of the intercept in the gmass model???

Third question: how to choose the proper functional?


Always difficult because there are no strong optimality results. If you test 
just a single parameter (i.e., gmass2), then I would expect that most 
functionals should lead to similar performance. I would try

plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, functional = 
meanL2BB) plot(gmass2, functional = supLM(0.1))

where the latter two would probably have somewhat higher power. But the rangeBB 
might also work rather well here...


I have seen that you discuss that in your CSDA(2006  2003) papers
and, in the 2006 paper you say: in situations where there is a shift
in the parameters and then a second shift back, it is advantageous to
aggregate over time using the range and . Which means, if I
understand well that rangeBB would be adapted to the kind of test I want to 
perform.
However, since I want to determine the timing of the peaks, I need my
functional to produce a time series plot, for example like meanL2BB
does. Do you think I can use meanL2BB functional in my case or should
I compute an home-made functional which would use the range of efp
but with comp applied first and time after (is this possible???).


When you test only a single parameter, then you don't need to aggregate across 
the components anyway because there is just a single one.


Fourth question: is it OK to only make a visual estimation

Re: [R] strucchange Fstats() example

2012-06-01 Thread Mabille, Geraldine
Thanks! I had seen this section but hadn't understood that the dashed line 
represented the mean.
Does that mean in my case (two tests hardly significant P=0.03 and one NS) that 
I should reject the hypothesis of different levels of survival probability for 
different groups of females?? 

-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] 
Sent: 1. juni 2012 12:49
To: Mabille, Geraldine
Cc: R-help@r-project.org
Subject: RE: [R] strucchange Fstats() example

On Fri, 1 Jun 2012, Mabille, Geraldine wrote:

 Hi and thanks again for your  answer. I have just a last question regarding 
 the choice of the functional...if you have time to help on that again.

 I have tried running the sctest using the functionals you recommended
 sctest(gmass2,functional=meanL2BB)
 sctest(gmass2,functional=rangeBB)
 sctest(gmass2,functional=supLM(0.1))

 and I have a question regarding the meanL2BB test. The P-value 
 obtained for this test is P=0.18 and the plot obtained is given as an 
 attachment to the message. What I don't understand is why the overall 
 test for meanL2BB is not significant while we see on the graph at 
 least two peaks above the red line?

This is discussed at the end of Section 5.1 (middle of page 3001) in the
2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ 
of the process, not the _maximum_. Hence, the mean (visualized by the dashed 
horizontal line) needs to exceed the critical value (red horizontal line) to 
obtain a significant result.

Best,
Z

 The other two tests (rangeBB and supLM) are both significant at 
 P=0.03.

 Thanks again for your precious help,
 Geraldine

 -Original Message-
 From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
 Sent: 1. juni 2012 08:56
 To: Mabille, Geraldine
 Cc: R-help@r-project.org
 Subject: RE: [R] strucchange Fstats() example

 On Thu, 31 May 2012, Mabille, Geraldine wrote:

 Thanks a lot for your answer Achim, this helped a lot. I have done a 
 lot of reading, following your recommendations and I think I have a 
 better idea of what I should use. My dataset contains binary data on 
 survival of the calf depending on the mass of the mother. We know 
 that the probability of survival of the calf should vary according to 
 the mass of the mother: 3 groups of mass expected, lower survival of 
 the calves for small and large females, best survival of calf for 
 intermediate-sized females. I want to identify at which masses those 
 changes in survival occur.

 I think the code I need to use in order to test what I want is something of 
 that type:
 gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

 The first question I have is what is the difference between testing the 
 gmass model shown up compared to the gmass2 model below?
 gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)

 gmass2 will be appropriate if under the null hypothesis there is a constant 
 probability of survival and under the alternative there are different groups 
 each of which has a constant probability of survival.

 gmass will be appropriate if under the null hypothesis the logit of the 
 probability of survival increases/decreases linearly with mass - and under 
 the alternative there are different groups each of which has a probability 
 that increases/decreases with mass.

 From your description above I suspect that gmass2 is what you are looking for.

 The second question, which is related I think to the first one is 
 whether it makes sense to plot the gmass model as aggregate=FALSE, 
 knowing that we have a single parameter in the model (Mass),

 In the gmass2 model, there is a single parameter (probability of survival) 
 which may or may not change across mass.

 In the gmass model, there are two parameters (intercept and slope) which may 
 or may not vary across mass.

 and this parameter is also the parameter we use as the order.by= 
 parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think 
 the whole point around questions 1 and 2 is that I don't understand 
 the interpretation of the intercept in the gmass model???

 Third question: how to choose the proper functional?

 Always difficult because there are no strong optimality results. If 
 you test just a single parameter (i.e., gmass2), then I would expect 
 that most functionals should lead to similar performance. I would try

 plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2, 
 functional = meanL2BB) plot(gmass2, functional = supLM(0.1))

 where the latter two would probably have somewhat higher power. But the 
 rangeBB might also work rather well here...

 I have seen that you discuss that in your CSDA(2006  2003) papers 
 and, in the 2006 paper you say: in situations where there is a shift 
 in the parameters and then a second shift back, it is advantageous to 
 aggregate over time using the range and . Which means, if I 
 understand well that rangeBB would be adapted to the kind of test I want

Re: [R] strucchange Fstats() example

2012-06-01 Thread Achim Zeileis

On Fri, 1 Jun 2012, Mabille, Geraldine wrote:

Thanks! I had seen this section but hadn't understood that the dashed 
line represented the mean. Does that mean in my case (two tests hardly 
significant P=0.03 and one NS) that I should reject the hypothesis of 
different levels of survival probability for different groups of 
females??


Hard to say with the information given. The evidence does not seem to be 
overwhelming but given that the tests are asymptotic in nature it might be 
better to employ a nonparametric test.


But this now goes beyond the R-help focus, I guess. If you contact me 
off-list, I can have a look if you're interested.


Best,
Z


-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
Sent: 1. juni 2012 12:49
To: Mabille, Geraldine
Cc: R-help@r-project.org
Subject: RE: [R] strucchange Fstats() example

On Fri, 1 Jun 2012, Mabille, Geraldine wrote:


Hi and thanks again for your  answer. I have just a last question regarding the 
choice of the functional...if you have time to help on that again.

I have tried running the sctest using the functionals you recommended
sctest(gmass2,functional=meanL2BB)
sctest(gmass2,functional=rangeBB)
sctest(gmass2,functional=supLM(0.1))

and I have a question regarding the meanL2BB test. The P-value
obtained for this test is P=0.18 and the plot obtained is given as an
attachment to the message. What I don't understand is why the overall
test for meanL2BB is not significant while we see on the graph at
least two peaks above the red line?


This is discussed at the end of Section 5.1 (middle of page 3001) in the
2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ 
of the process, not the _maximum_. Hence, the mean (visualized by the dashed 
horizontal line) needs to exceed the critical value (red horizontal line) to 
obtain a significant result.

Best,
Z


The other two tests (rangeBB and supLM) are both significant at
P=0.03.

Thanks again for your precious help,
Geraldine

-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at]
Sent: 1. juni 2012 08:56
To: Mabille, Geraldine
Cc: R-help@r-project.org
Subject: RE: [R] strucchange Fstats() example

On Thu, 31 May 2012, Mabille, Geraldine wrote:


Thanks a lot for your answer Achim, this helped a lot. I have done a
lot of reading, following your recommendations and I think I have a
better idea of what I should use. My dataset contains binary data on
survival of the calf depending on the mass of the mother. We know
that the probability of survival of the calf should vary according to
the mass of the mother: 3 groups of mass expected, lower survival of
the calves for small and large females, best survival of calf for
intermediate-sized females. I want to identify at which masses those
changes in survival occur.

I think the code I need to use in order to test what I want is something of 
that type:
gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass 
model shown up compared to the gmass2 model below?
gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)


gmass2 will be appropriate if under the null hypothesis there is a constant 
probability of survival and under the alternative there are different groups 
each of which has a constant probability of survival.

gmass will be appropriate if under the null hypothesis the logit of the 
probability of survival increases/decreases linearly with mass - and under the 
alternative there are different groups each of which has a probability that 
increases/decreases with mass.

From your description above I suspect that gmass2 is what you are looking for.


The second question, which is related I think to the first one is
whether it makes sense to plot the gmass model as aggregate=FALSE,
knowing that we have a single parameter in the model (Mass),


In the gmass2 model, there is a single parameter (probability of survival) 
which may or may not change across mass.

In the gmass model, there are two parameters (intercept and slope) which may or 
may not vary across mass.


and this parameter is also the parameter we use as the order.by=
parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think
the whole point around questions 1 and 2 is that I don't understand
the interpretation of the intercept in the gmass model???

Third question: how to choose the proper functional?


Always difficult because there are no strong optimality results. If
you test just a single parameter (i.e., gmass2), then I would expect
that most functionals should lead to similar performance. I would try

plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2,
functional = meanL2BB) plot(gmass2, functional = supLM(0.1))

where the latter two would probably have somewhat higher power. But the rangeBB 
might also work rather well here...


I have seen that you discuss that in your CSDA(2006  2003

Re: [R] strucchange Fstats() example

2012-05-31 Thread Mabille, Geraldine
Thanks a lot for your answer Achim, this helped a lot. I have done a lot of 
reading, following your recommendations and I think I have a better idea of 
what I should use. My dataset contains binary data on survival of the calf 
depending on the mass of the mother. We know that the probability of survival 
of the calf should vary according to the mass of the mother: 3 groups of mass 
expected, lower survival of the calves for small and large females, best 
survival of calf for intermediate-sized females. I want to identify at which 
masses those changes in survival occur.

I think the code I need to use in order to test what I want is something of 
that type:
gmass-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)

The first question I have is what is the difference between testing the gmass 
model shown up compared to the gmass2 model below?
gmass2-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)

The second question, which is related I think to the first one is whether it 
makes sense to plot the gmass model as aggregate=FALSE, knowing that we have a 
single parameter in the model (Mass), and this parameter is also the parameter 
we use as the order.by= parameter 
plot(gmass, functional=meanL2BB, aggregate=FALSE)
I think the whole point around questions 1 and 2  is that I don't understand 
the interpretation of the intercept in the gmass model???

Third question: how to choose the proper functional? I have seen that you 
discuss that in your CSDA(2006  2003) papers and, in the 2006 paper you say: 
in situations where there is a shift in the parameters and then a second shift 
back, it is advantageous to aggregate over time using the range and . 
Which means, if I understand well that rangeBB would be adapted to the kind of 
test I want to perform. However, since I want to determine the timing of the 
peaks, I need my functional to produce a time series plot, for example like 
meanL2BB does. Do you think I can use meanL2BB functional in my case or should 
I compute an home-made functional which would use the range of efp but with 
comp applied first and time after (is this possible???).

Fourth question: is it OK to only make a visual estimation of the breakpoints 
 from the peaks seen on the graph after plotting the efp or should I use the 
breakpoints() function to properly date the breakpoints??? I'm not sure this 
breakpoints() function can be applied to binary data?

Fifth question: I have noticed that the p-values I obtain after performing the 
sctest(gmass, functional=meanL2BB) for example are a bit different depending on 
if I introduce family=binomial as an argument in my gefp() call. Should I use 
this argument or is it used by default when you specify fit=glm??? 

Last question, you said in your previous message that I could look at the  
maxstat_test() from package coin for an interesting nonparametric alternative 
but I think this package does not allow estimation of more than one 
breakpoint???

Thanks heaps if you can help again with those issues,
Best,
Geraldine





 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Achim Zeileis
Sent: 30. mai 2012 08:23
To: R-help@r-project.org
Subject: Re: [R] strucchange Fstats() example

On Tue, 29 May 2012, Mabille, Geraldine wrote:

snip

 In the second example, the authors state the presence of at least 
 two breakpoints. When plotting the F-statistics using the following 
 code, we see indeed two peaks in the F-statistics, that coincides with 
 the dates given by the authors: c.a 1973 and 1983 but when trying to 
 add those breakpoints to the time series, only one is taken into 
 account

The breakpoints() method for Fstats objects can just extract a single 
breakpoint. The reason is that maximizing the F statistics is equivalent to 
minimizing the residual sum of squares of a model with a single breakpoint. If 
you want to estimate more than a single breakpoint, you need to minimize the 
corresponding segmented sums of squares. This can be done with the formula 
method of breakpoints(), see ?breakpoints.

More specificially: In your example with breakpoints(fs, breaks = 2), the 
breaks argument is simply ignored. The method just does not have a breaks 
argument and it goes through ...

 We see that even though the F-statistics seem to show the existence of 
 2 breakpoints, only one is detected by the breakpoints() function. 
 Does anyone know how this is possible? I'm totally new to strucchange 
 so it might well be something obvious I'm missing here!

Please have a closer look at the package's documentation and the corresponding 
papers. See citation(strucchange) for the most important references and the 
corresponding manual pages for more details. For the breakpoints issue you 
should probably start reading the CSDA paper.

 OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???

Testing for breakpoints can be done

Re: [R] strucchange Fstats() example

2012-05-30 Thread Achim Zeileis

On Tue, 29 May 2012, Mabille, Geraldine wrote:

snip

In the second example, the authors state the presence of at least two 
breakpoints. When plotting the F-statistics using the following code, we see 
indeed two peaks in the F-statistics, that coincides with the dates given by 
the authors: c.a 1973 and 1983 but when trying to add those breakpoints to 
the time series, only one is taken into account


The breakpoints() method for Fstats objects can just extract a single 
breakpoint. The reason is that maximizing the F statistics is equivalent to 
minimizing the residual sum of squares of a model with a single breakpoint. If 
you want to estimate more than a single breakpoint, you need to minimize the 
corresponding segmented sums of squares. This can be done with the formula 
method of breakpoints(), see ?breakpoints.


More specificially: In your example with breakpoints(fs, breaks = 2), the 
breaks argument is simply ignored. The method just does not have a breaks 
argument and it goes through ...


We see that even though the F-statistics seem to show the existence of 2 
breakpoints, only one is detected by the breakpoints() function. Does anyone 
know how this is possible? I'm totally new to strucchange so it might well be 
something obvious I'm missing here!


Please have a closer look at the package's documentation and the corresponding 
papers. See citation(strucchange) for the most important references and the 
corresponding manual pages for more details. For the breakpoints issue you 
should probably start reading the CSDA paper.



OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???


Testing for breakpoints can be done with the function gefp(). See its manual 
pages for references and details. The manual page just has a Poisson GLM 
example but the corresponding papers (in Stat Neerl and CSDA) also have binary 
response examples.


If you have a binary response and just want to test whether the proportion of 
successes changes across time (or some other variable of interest), then 
maxstat_test() from package coin might be an interesting nonparametric 
alternative.


hth,
Z

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strucchange Fstats() example

2012-05-29 Thread Mabille, Geraldine
Dear all,
I'm trying to understand how the strucchange package is working and I have been 
looking at the examples given for the Fstats() function.
The first example (Nile), shows one peak in the F-stats and one breakpoint is 
estimated, that can be plotted using the following code

## Nile data with one breakpoint: the annual flows drop in 1898
## because the first Ashwan dam was built
data(Nile)
plot(Nile)
## test the null hypothesis that the annual flow remains constant
## over the years
fs.nile - Fstats(Nile ~ 1)
plot(fs.nile, alpha=0.05)
sctest(fs.nile)
## visualize the breakpoint implied by the argmax of the F statistics
plot(Nile)
lines(breakpoints(fs.nile))


In the second example, the authors state the presence of at least two 
breakpoints. When plotting the F-statistics using the following code, we see 
indeed two peaks in the F-statistics, that coincides with the dates given by 
the authors: c.a 1973 and 1983 but when trying to add those breakpoints to the 
time series, only one is taken into account


## UK Seatbelt data: a SARIMA(1,0,0)(1,0,0)_12 model
## (fitted by OLS) is used and reveals (at least) two
## breakpoints - one in 1973 associated with the oil crisis and
## one in 1983 due to the introduction of compulsory
## wearing of seatbelts in the UK.
data(UKDriverDeaths)
seatbelt - log10(UKDriverDeaths)
seatbelt - cbind(seatbelt, lag(seatbelt, k = -1), lag(seatbelt, k = -12))
colnames(seatbelt) - c(y, ylag1, ylag12)
seatbelt - window(seatbelt, start = c(1970, 1), end = c(1984,12))
plot(seatbelt[,y], ylab = expression(log[10](casualties)))
## compute F statistics for potential breakpoints between
## 1971(6) (corresponds to from = 0.1) and 1983(6) (corresponds to
## to = 0.9 = 1 - from, the default)
## compute F statistics
fs - Fstats(y ~ ylag1 + ylag12, data = seatbelt, from = 0.1)
## this gives the same result
fs - Fstats(y ~ ylag1 + ylag12, data = seatbelt, from = c(1971, 6),
to = c(1983, 6))
## plot the F statistics
plot(fs, alpha = 0.05)

###TRYING to plot the 2 breakpoints:
lines(breakpoints(fs,breaks=2))

...only one breakpoint is plotted. When looking at the results from

 breakpoints(fs,breaks=2)

Optimal 2-segment partition:

Call: breakpoints.Fstats(obj = fs, breaks = 2)
Breakpoints at observation number:
46
Corresponding to breakdates:
1973(10)

We see that even though the F-statistics seem to show the existence of 2 
breakpoints, only one is detected by the breakpoints() function. Does anyone 
know how this is possible? I'm totally new to strucchange so it might well be 
something obvious I'm missing here!


OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???

Thanks heaps if anyone can help
Geraldine

From: Mabille, Geraldine
Sent: 29. mai 2012 11:30
To: 'r-help@r-project.org'
Subject: GAM interactions, by example

Dear all,
I'm using the mgcv library by Simon Wood to fit gam models with interactions 
and I have been reading (and running) the factor 'by' variable example   
given on the gam.models help page (see below, output from the two first models 
b, and b1).
The example explains that both b and b1 fits are similar: note that the 
preceding fit (here b) is the same as (b1)
I agree with the idea that it looks the same but when I look at the results 
from both models (summary b and summary b1) I see that the results look in fact 
quite different (edf, and also deviance explained for example???)
Are those two models (b and b1) really testing the same things??? If yes, why 
are the results so different between models???
Thanks a lot if anyone can help with that...
Geraldine


dat - gamSim(4)

## fit model...
b - gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(b,pages=1)
summary(b)

Family: gaussian
Link function: identity

Formula:
y ~ fac + s(x2, by = fac) + s(x0)

Parametric coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)   1.1784 0.1985   5.937 6.59e-09 ***
fac2 -1.2148 0.2807  -4.329 1.92e-05 ***
fac3  2.2012 0.2436   9.034   2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
 edf Ref.df  F  p-value
s(x2):fac1 5.364  6.472  2.285   0.0312 *
s(x2):fac2 4.523  5.547 11.396 4.59e-11 ***
s(x2):fac3 8.024  8.741 43.456   2e-16 ***
s(x0)  1.000  1.000  0.237   0.6269
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.634   Deviance explained = 65.3%
GCV score = 4.0288  Scale est. = 3.8082n = 400

## note that the preceding fit is the same as
b1-gam(y ~ s(x2,by=as.numeric(fac==1))+s(x2,by=as.numeric(fac==2))+
s(x2,by=as.numeric(fac==3))+s(x0)-1,data=dat)
## ... the `-1' is because the intercept is confounded with the
## *uncentred* smooths here.
plot(b1,pages=1)
summary(b1)

Family: gaussian
Link function: identity

Formula:
y ~ s(x2, by = as.numeric(fac == 1)) + s(x2, by = as.numeric(fac ==
2)) + s(x2, by = as.numeric(fac == 3)) + s(x0) - 1

Approximate