Re: [R] Specifying Path Model in SEM for CFA

2006-08-17 Thread Rick Bilonick
On Wed, 2006-08-16 at 17:01 -0400, John Fox wrote:
 Dear Rick,
 
 It's unclear to me what you mean by constraining each column of the factor
 matrix to sum to one. If you intend to constrain the loadings on each
 factor to sum to one, sem() won't do that, since it supports only equality
 constraints, not general linear constraints on parameters of the model, but
 why such a constraint would be reasonable in the first place escapes me.
 More common in confirmatory factor analysis would be to constrain more of
 the loadings to zero. Of course, one would do this only if it made
 substantive sense in the context of the research.
 
 Regards,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  

I'm trying to build a multivariate receptor model as described by
Christensen and Sain (Technometrics, vol 44 (4) pp. 328-337). The model
is

x_t = Af_t + e_t

where A is the matrix of nonnegative source compositions, x_t are the
observed pollutant concentrations at time t, and f_t are the unobserved
factors. The columns of A are supposed to sum to no more than 100%. They
say they are using a latent variable model. If sem can't handle this, do
you know of another R package that could?

Rick B.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Specifying Path Model in SEM for CFA

2006-08-17 Thread John Fox
Dear Rick,


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Rick Bilonick
 Sent: Thursday, August 17, 2006 7:07 AM
 To: John Fox
 Cc: 'R Help'; 'Rick Bilonick'
 Subject: Re: [R] Specifying Path Model in SEM for CFA
 

. . .

 
 I'm trying to build a multivariate receptor model as 
 described by Christensen and Sain (Technometrics, vol 44 (4) 
 pp. 328-337). The model is
 
 x_t = Af_t + e_t
 
 where A is the matrix of nonnegative source compositions, x_t 
 are the observed pollutant concentrations at time t, and f_t 
 are the unobserved factors. The columns of A are supposed to 
 sum to no more than 100%. They say they are using a latent 
 variable model. If sem can't handle this, do you know of 
 another R package that could?
 

sem() handles only equality constraints among parameters, and this model
requires linear inequality constraints. 

I'm aware of SEM software that handles inequality constraints, but I'm not
aware of anything in R that will do it out of the box. One possibility is
to write out the likelihood (or fitting function) for your model and
perform a bounded optimization using optim(). It would probably be a fair
amount of work setting up the problem.

Finally, there are tricks that permit the imposition of general constraints
and inequality constraints using software, like sem(), that handles only
equality constraints. It's probably possible to do what you want using such
a trick, but it would be awkward. See the references given in Bollen,
Structural Equations with Latent Variables (Wiley, 1989), pp. 401-403.

I'm sorry that I can't be of more direct help.
 John

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Specifying Path Model in SEM for CFA

2006-08-17 Thread Rick Bilonick
sem() handles only equality constraints among parameters, and this model
 requires linear inequality constraints. 
 
 I'm aware of SEM software that handles inequality constraints, but I'm not
 aware of anything in R that will do it out of the box. One possibility is
 to write out the likelihood (or fitting function) for your model and
 perform a bounded optimization using optim(). It would probably be a fair
 amount of work setting up the problem.
 
 Finally, there are tricks that permit the imposition of general constraints
 and inequality constraints using software, like sem(), that handles only
 equality constraints. It's probably possible to do what you want using such
 a trick, but it would be awkward. See the references given in Bollen,
 Structural Equations with Latent Variables (Wiley, 1989), pp. 401-403.
 
 I'm sorry that I can't be of more direct help.
  John


Thanks. I'll explore the options you mention. I would like to use R
because I need to couple this with block bootstrapping to handle time
dependencies.

Rick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Specifying Path Model in SEM for CFA

2006-08-16 Thread John Fox
Dear Rick,

There are a couple of problems here:

(1) You've fixed the error variance parameters for each of the observed
variables to 1 rather than defining each as a free parameter to estimate.
For example, use 

X1 - X1, theta1, NA

Rather than 

X1 - X1, NA, 1

The general principle is that if you give a parameter a name, it's a free
parameter to be estimated; if you give the name as NA, then the parameter is
given a fixed value (here, 1). (There is some more information on this and
on error-variance parameters in ?sem.)

(2) I believe that the model you're trying to specify -- in which all
variables but X6 load on F1, and all variables but X1 load on F2 -- is
underidentified.

In addition, you've set the metric of the factors by fixing one loading to
0.20 and another to 0.25. That should work but strikes me as unusual, and
makes me wonder whether this was what you really intended. It would be more
common in a CFA to fix the variance of each factor to 1, and let the factor
loadings be free parameters. Then the factor covariance would be their
correlation. 

You should not have to specify start values for free parameters (such as
g11, g22, and g12 in your model), though it is not wrong to do so. I would
not, however, specify start values that imply a singular covariance matrix
among the factors, as you've done; I'm surprised that the program was able
to get by the start values to produce a solution.

BTW, the Thurstone example in ?sem is for a confirmatory factor analysis
(albeit a slightly more complicated one with a second-order factor). There's
also an example of a one-factor CFA in the paper at
http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf, though this
is for ordinal observed variables.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Rick Bilonick
 Sent: Tuesday, August 15, 2006 11:50 PM
 To: R Help
 Subject: [R] Specifying Path Model in SEM for CFA
 
 I'm using specify.model for the sem package. I can't figure 
 out how to represent the residual errors for the observed 
 variables for a CFA model. (Once I get this working I need to 
 add some further constraints.)
 
 Here is what I've tried:
 
 model.sa - specify.model()
   F1   - X1,l11, NA
   F1   - X2,l21, NA
   F1   - X3,l31, NA
   F1   - X4,l41, NA
   F1   - X5, NA, 0.20
   F2   - X1,l12, NA
   F2   - X2,l22, NA
   F2   - X3,l32, NA
   F2   - X4,l42, NA
   F2   - X6, NA, 0.25
   F1  - F2,g12, 1
   F1- F1,g11, 1
   F2- F2,g22, 1
   X1  - X1, NA, 1
   X2  - X2, NA, 1
   X3  - X3, NA, 1
   X4  - X4, NA, 1
   X5  - X5, NA, 1
   X6  - X6, NA, 1
 
 This at least converges:
 
  summary(fit.sem)
 
  Model Chisquare =  2147   Df =  10 Pr(Chisq) = 0
  Chisquare (null model) =  2934   Df =  15
  Goodness-of-fit index =  0.4822
  Adjusted goodness-of-fit index =  -0.087387
  RMSEA index =  0.66107   90 % CI: (NA, NA)
  Bentler-Bonnett NFI =  0.26823
  Tucker-Lewis NNFI =  -0.098156
  Bentler CFI =  0.26790
  BIC =  2085.1
 
  Normalized Residuals
Min. 1st Qu.  MedianMean 3rd Qu.Max.
  -5.990  -0.618   0.192   0.165   1.700   3.950
 
  Parameter Estimates
 Estimate  Std Error z value  Pr(|z|)
 l11 -0.245981 0.21863   -1.12510 0.26054748 X1 --- F1
 l21 -0.308249 0.22573   -1.36555 0.17207875 X2 --- F1
 l31  0.202590 0.079102.56118 0.01043175 X3 --- F1
 l41 -0.235156 0.21980   -1.06985 0.28468885 X4 --- F1
 l12  0.839985 0.219623.82476 0.00013090 X1 --- F2
 l22  0.828460 0.225483.67418 0.00023862 X2 --- F2
 l32  0.066722 0.083690.79725 0.42530606 X3 --- F2
 l42  0.832037 0.218403.80963 0.00013917 X4 --- F2
 g12  0.936719 0.643311.45609 0.14536647 F2 -- F1
 g11  2.567669 1.256082.04418 0.04093528 F1 -- F1
 g22  1.208497 0.550402.19567 0.02811527 F2 -- F2
 
  Iterations =  59
 
 And it produces the following path diagram:
 
  path.diagram(fit.sem)
 digraph fit.sem {
   rankdir=LR;
   size=8,8;
   node [fontname=Helvetica fontsize=14 shape=box];
   edge [fontname=Helvetica fontsize=10];
   center=1;
   F2 [shape=ellipse]
   F1 [shape=ellipse]
   F1 - X1 [label=l11];
   F1 - X2 [label=l21];
   F1 - X3 [label=l31];
   F1 - X4 [label=l41];
   F1 - X5 [label=];
   F2 - X1 [label=l12];
   F2 - X2 [label=l22];
   F2 - X3 [label=l32];
   F2 - X4 [label=l42];
   F2 - X6 [label=];
 }
 
 But I don't see the residual error terms that go into each of 
 the observed variables X1 - X6. I've tried:
 
 model.sa - specify.model()
   E1   - X1, e1,  1
   E2   - X2, e2,  1
   E3   - X3, e3,  1
   E4   - X4, e4,  1
   E5   - X5, e5,  1
   E6   - X6, e6,  1
   E1  - E1, s1, NA
   E2  - E2, s2, NA
   E3  - E3, s3, NA
   E4  - E4, s4, NA
   E5  - E5, s5, NA
   E6  - E6, s6, NA
   F1   - X1,l11, NA
   F1   - X2,l21, NA
   F1   - X3,l31, NA

Re: [R] Specifying Path Model in SEM for CFA

2006-08-16 Thread Rick Bilonick
On Wed, 2006-08-16 at 08:47 -0400, John Fox wrote:
 Dear Rick,
 
 There are a couple of problems here:
 
 (1) You've fixed the error variance parameters for each of the observed
 variables to 1 rather than defining each as a free parameter to estimate.
 For example, use 
 
 X1 - X1, theta1, NA
 
 Rather than 
 
 X1 - X1, NA, 1
 
 The general principle is that if you give a parameter a name, it's a free
 parameter to be estimated; if you give the name as NA, then the parameter is
 given a fixed value (here, 1). (There is some more information on this and
 on error-variance parameters in ?sem.)
 
 (2) I believe that the model you're trying to specify -- in which all
 variables but X6 load on F1, and all variables but X1 load on F2 -- is
 underidentified.
 
 In addition, you've set the metric of the factors by fixing one loading to
 0.20 and another to 0.25. That should work but strikes me as unusual, and
 makes me wonder whether this was what you really intended. It would be more
 common in a CFA to fix the variance of each factor to 1, and let the factor
 loadings be free parameters. Then the factor covariance would be their
 correlation. 
 
 You should not have to specify start values for free parameters (such as
 g11, g22, and g12 in your model), though it is not wrong to do so. I would
 not, however, specify start values that imply a singular covariance matrix
 among the factors, as you've done; I'm surprised that the program was able
 to get by the start values to produce a solution.
 
 BTW, the Thurstone example in ?sem is for a confirmatory factor analysis
 (albeit a slightly more complicated one with a second-order factor). There's
 also an example of a one-factor CFA in the paper at
 http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf, though this
 is for ordinal observed variables.
 
 I hope this helps,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  

Thanks for the information. I think I understand how to handle the
residual variance after reading the sem help file more carefully. Now I
have to figure out how to constrain each column of the factor matrix to
sum to one. Maybe this will fix the problem with being under-identified.

Rick B.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Specifying Path Model in SEM for CFA

2006-08-16 Thread John Fox
Dear Rick,

It's unclear to me what you mean by constraining each column of the factor
matrix to sum to one. If you intend to constrain the loadings on each
factor to sum to one, sem() won't do that, since it supports only equality
constraints, not general linear constraints on parameters of the model, but
why such a constraint would be reasonable in the first place escapes me.
More common in confirmatory factor analysis would be to constrain more of
the loadings to zero. Of course, one would do this only if it made
substantive sense in the context of the research.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: Rick Bilonick [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 16, 2006 12:07 PM
 To: John Fox
 Cc: 'R Help'
 Subject: Re: [R] Specifying Path Model in SEM for CFA
 
 On Wed, 2006-08-16 at 08:47 -0400, John Fox wrote:
  Dear Rick,
  
  There are a couple of problems here:
  
  (1) You've fixed the error variance parameters for each of the 
  observed variables to 1 rather than defining each as a free 
 parameter to estimate.
  For example, use
  
  X1 - X1, theta1, NA
  
  Rather than
  
  X1 - X1, NA, 1
  
  The general principle is that if you give a parameter a 
 name, it's a 
  free parameter to be estimated; if you give the name as NA, 
 then the 
  parameter is given a fixed value (here, 1). (There is some more 
  information on this and on error-variance parameters in ?sem.)
  
  (2) I believe that the model you're trying to specify -- in 
 which all 
  variables but X6 load on F1, and all variables but X1 load 
 on F2 -- is 
  underidentified.
  
  In addition, you've set the metric of the factors by fixing one 
  loading to 0.20 and another to 0.25. That should work but 
 strikes me 
  as unusual, and makes me wonder whether this was what you really 
  intended. It would be more common in a CFA to fix the 
 variance of each 
  factor to 1, and let the factor loadings be free 
 parameters. Then the 
  factor covariance would be their correlation.
  
  You should not have to specify start values for free 
 parameters (such 
  as g11, g22, and g12 in your model), though it is not wrong 
 to do so. 
  I would not, however, specify start values that imply a singular 
  covariance matrix among the factors, as you've done; I'm surprised 
  that the program was able to get by the start values to 
 produce a solution.
  
  BTW, the Thurstone example in ?sem is for a confirmatory factor 
  analysis (albeit a slightly more complicated one with a 
 second-order 
  factor). There's also an example of a one-factor CFA in the 
 paper at 
  http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf, 
  though this is for ordinal observed variables.
  
  I hope this helps,
   John
  
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
 
 Thanks for the information. I think I understand how to 
 handle the residual variance after reading the sem help file 
 more carefully. Now I have to figure out how to constrain 
 each column of the factor matrix to sum to one. Maybe this 
 will fix the problem with being under-identified.
 
 Rick B.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Specifying Path Model in SEM for CFA

2006-08-15 Thread Rick Bilonick
I'm using specify.model for the sem package. I can't figure out how to
represent the residual errors for the observed variables for a CFA
model. (Once I get this working I need to add some further constraints.)

Here is what I've tried:

model.sa - specify.model()
  F1 - X1,l11, NA
  F1 - X2,l21, NA
  F1 - X3,l31, NA
  F1 - X4,l41, NA
  F1 - X5, NA, 0.20
  F2 - X1,l12, NA
  F2 - X2,l22, NA
  F2 - X3,l32, NA
  F2 - X4,l42, NA
  F2 - X6, NA, 0.25
  F1- F2,g12, 1
  F1- F1,g11, 1
  F2- F2,g22, 1
  X1- X1, NA, 1
  X2- X2, NA, 1
  X3- X3, NA, 1
  X4- X4, NA, 1
  X5- X5, NA, 1
  X6- X6, NA, 1

This at least converges:

 summary(fit.sem)

 Model Chisquare =  2147   Df =  10 Pr(Chisq) = 0
 Chisquare (null model) =  2934   Df =  15
 Goodness-of-fit index =  0.4822
 Adjusted goodness-of-fit index =  -0.087387
 RMSEA index =  0.66107   90 % CI: (NA, NA)
 Bentler-Bonnett NFI =  0.26823
 Tucker-Lewis NNFI =  -0.098156
 Bentler CFI =  0.26790
 BIC =  2085.1

 Normalized Residuals
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 -5.990  -0.618   0.192   0.165   1.700   3.950

 Parameter Estimates
Estimate  Std Error z value  Pr(|z|)
l11 -0.245981 0.21863   -1.12510 0.26054748 X1 --- F1
l21 -0.308249 0.22573   -1.36555 0.17207875 X2 --- F1
l31  0.202590 0.079102.56118 0.01043175 X3 --- F1
l41 -0.235156 0.21980   -1.06985 0.28468885 X4 --- F1
l12  0.839985 0.219623.82476 0.00013090 X1 --- F2
l22  0.828460 0.225483.67418 0.00023862 X2 --- F2
l32  0.066722 0.083690.79725 0.42530606 X3 --- F2
l42  0.832037 0.218403.80963 0.00013917 X4 --- F2
g12  0.936719 0.643311.45609 0.14536647 F2 -- F1
g11  2.567669 1.256082.04418 0.04093528 F1 -- F1
g22  1.208497 0.550402.19567 0.02811527 F2 -- F2

 Iterations =  59

And it produces the following path diagram:

 path.diagram(fit.sem)
digraph fit.sem {
  rankdir=LR;
  size=8,8;
  node [fontname=Helvetica fontsize=14 shape=box];
  edge [fontname=Helvetica fontsize=10];
  center=1;
  F2 [shape=ellipse]
  F1 [shape=ellipse]
  F1 - X1 [label=l11];
  F1 - X2 [label=l21];
  F1 - X3 [label=l31];
  F1 - X4 [label=l41];
  F1 - X5 [label=];
  F2 - X1 [label=l12];
  F2 - X2 [label=l22];
  F2 - X3 [label=l32];
  F2 - X4 [label=l42];
  F2 - X6 [label=];
}

But I don't see the residual error terms that go into each of the
observed variables X1 - X6. I've tried:

model.sa - specify.model()
  E1 - X1, e1,  1
  E2 - X2, e2,  1
  E3 - X3, e3,  1
  E4 - X4, e4,  1
  E5 - X5, e5,  1
  E6 - X6, e6,  1
  E1- E1, s1, NA
  E2- E2, s2, NA
  E3- E3, s3, NA
  E4- E4, s4, NA
  E5- E5, s5, NA
  E6- E6, s6, NA
  F1 - X1,l11, NA
  F1 - X2,l21, NA
  F1 - X3,l31, NA
  F1 - X4,l41, NA
  F1 - X5, NA,  1
  F2 - X1,l12, NA
  F2 - X2,l22, NA
  F2 - X3,l32, NA
  F2 - X4,l42, NA
  F2 - X6, NA,  1
  F1- F2, NA, 1
  F1- F1, NA, 1
  F2- F2,g22, NA
  X1- X1, NA, 1
  X2- X2, NA, 1
  X3- X3, NA, 1
  X4- X4, NA, 1
  X5- X5, NA, 1
  X6- X6, NA, 1

I'm trying to use E1 - E6 as the residual error terms. But I get warning
messages about no variances for X1-X6 and it doesn't converge. Also, the
associated path diagram:

digraph fit.sem {
  rankdir=LR;
  size=8,8;
  node [fontname=Helvetica fontsize=14 shape=box];
  edge [fontname=Helvetica fontsize=10];
  center=1;
  E1 [shape=ellipse]
  E2 [shape=ellipse]
  E3 [shape=ellipse]
  E4 [shape=ellipse]
  E5 [shape=ellipse]
  E6 [shape=ellipse]
  F2 [shape=ellipse]
  F1 [shape=ellipse]
  E1 - X1 [label=];
  E2 - X2 [label=];
  E3 - X3 [label=];
  E4 - X4 [label=];
  E5 - X5 [label=];
  E6 - X6 [label=];
  F1 - X1 [label=l11];
  F1 - X2 [label=l21];
  F1 - X3 [label=l31];
  F1 - X4 [label=l41];
  F1 - X5 [label=];
  F2 - X1 [label=l12];
  F2 - X2 [label=l22];
  F2 - X3 [label=l32];
  F2 - X4 [label=l42];
  F2 - X6 [label=];
}

Has ellipses around the E1-E6 which I believe indicates they are latent
factors and not residual errors.

If anyone could point in the right direction I would appreciate it.

Rick B.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.