Re: [R] Specifying Path Model in SEM for CFA
On Wed, 2006-08-16 at 17:01 -0400, John Fox wrote: Dear Rick, It's unclear to me what you mean by constraining each column of the factor matrix to sum to one. If you intend to constrain the loadings on each factor to sum to one, sem() won't do that, since it supports only equality constraints, not general linear constraints on parameters of the model, but why such a constraint would be reasonable in the first place escapes me. More common in confirmatory factor analysis would be to constrain more of the loadings to zero. Of course, one would do this only if it made substantive sense in the context of the research. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox I'm trying to build a multivariate receptor model as described by Christensen and Sain (Technometrics, vol 44 (4) pp. 328-337). The model is x_t = Af_t + e_t where A is the matrix of nonnegative source compositions, x_t are the observed pollutant concentrations at time t, and f_t are the unobserved factors. The columns of A are supposed to sum to no more than 100%. They say they are using a latent variable model. If sem can't handle this, do you know of another R package that could? Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specifying Path Model in SEM for CFA
Dear Rick, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rick Bilonick Sent: Thursday, August 17, 2006 7:07 AM To: John Fox Cc: 'R Help'; 'Rick Bilonick' Subject: Re: [R] Specifying Path Model in SEM for CFA . . . I'm trying to build a multivariate receptor model as described by Christensen and Sain (Technometrics, vol 44 (4) pp. 328-337). The model is x_t = Af_t + e_t where A is the matrix of nonnegative source compositions, x_t are the observed pollutant concentrations at time t, and f_t are the unobserved factors. The columns of A are supposed to sum to no more than 100%. They say they are using a latent variable model. If sem can't handle this, do you know of another R package that could? sem() handles only equality constraints among parameters, and this model requires linear inequality constraints. I'm aware of SEM software that handles inequality constraints, but I'm not aware of anything in R that will do it out of the box. One possibility is to write out the likelihood (or fitting function) for your model and perform a bounded optimization using optim(). It would probably be a fair amount of work setting up the problem. Finally, there are tricks that permit the imposition of general constraints and inequality constraints using software, like sem(), that handles only equality constraints. It's probably possible to do what you want using such a trick, but it would be awkward. See the references given in Bollen, Structural Equations with Latent Variables (Wiley, 1989), pp. 401-403. I'm sorry that I can't be of more direct help. John __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specifying Path Model in SEM for CFA
sem() handles only equality constraints among parameters, and this model requires linear inequality constraints. I'm aware of SEM software that handles inequality constraints, but I'm not aware of anything in R that will do it out of the box. One possibility is to write out the likelihood (or fitting function) for your model and perform a bounded optimization using optim(). It would probably be a fair amount of work setting up the problem. Finally, there are tricks that permit the imposition of general constraints and inequality constraints using software, like sem(), that handles only equality constraints. It's probably possible to do what you want using such a trick, but it would be awkward. See the references given in Bollen, Structural Equations with Latent Variables (Wiley, 1989), pp. 401-403. I'm sorry that I can't be of more direct help. John Thanks. I'll explore the options you mention. I would like to use R because I need to couple this with block bootstrapping to handle time dependencies. Rick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specifying Path Model in SEM for CFA
Dear Rick, There are a couple of problems here: (1) You've fixed the error variance parameters for each of the observed variables to 1 rather than defining each as a free parameter to estimate. For example, use X1 - X1, theta1, NA Rather than X1 - X1, NA, 1 The general principle is that if you give a parameter a name, it's a free parameter to be estimated; if you give the name as NA, then the parameter is given a fixed value (here, 1). (There is some more information on this and on error-variance parameters in ?sem.) (2) I believe that the model you're trying to specify -- in which all variables but X6 load on F1, and all variables but X1 load on F2 -- is underidentified. In addition, you've set the metric of the factors by fixing one loading to 0.20 and another to 0.25. That should work but strikes me as unusual, and makes me wonder whether this was what you really intended. It would be more common in a CFA to fix the variance of each factor to 1, and let the factor loadings be free parameters. Then the factor covariance would be their correlation. You should not have to specify start values for free parameters (such as g11, g22, and g12 in your model), though it is not wrong to do so. I would not, however, specify start values that imply a singular covariance matrix among the factors, as you've done; I'm surprised that the program was able to get by the start values to produce a solution. BTW, the Thurstone example in ?sem is for a confirmatory factor analysis (albeit a slightly more complicated one with a second-order factor). There's also an example of a one-factor CFA in the paper at http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf, though this is for ordinal observed variables. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rick Bilonick Sent: Tuesday, August 15, 2006 11:50 PM To: R Help Subject: [R] Specifying Path Model in SEM for CFA I'm using specify.model for the sem package. I can't figure out how to represent the residual errors for the observed variables for a CFA model. (Once I get this working I need to add some further constraints.) Here is what I've tried: model.sa - specify.model() F1 - X1,l11, NA F1 - X2,l21, NA F1 - X3,l31, NA F1 - X4,l41, NA F1 - X5, NA, 0.20 F2 - X1,l12, NA F2 - X2,l22, NA F2 - X3,l32, NA F2 - X4,l42, NA F2 - X6, NA, 0.25 F1 - F2,g12, 1 F1- F1,g11, 1 F2- F2,g22, 1 X1 - X1, NA, 1 X2 - X2, NA, 1 X3 - X3, NA, 1 X4 - X4, NA, 1 X5 - X5, NA, 1 X6 - X6, NA, 1 This at least converges: summary(fit.sem) Model Chisquare = 2147 Df = 10 Pr(Chisq) = 0 Chisquare (null model) = 2934 Df = 15 Goodness-of-fit index = 0.4822 Adjusted goodness-of-fit index = -0.087387 RMSEA index = 0.66107 90 % CI: (NA, NA) Bentler-Bonnett NFI = 0.26823 Tucker-Lewis NNFI = -0.098156 Bentler CFI = 0.26790 BIC = 2085.1 Normalized Residuals Min. 1st Qu. MedianMean 3rd Qu.Max. -5.990 -0.618 0.192 0.165 1.700 3.950 Parameter Estimates Estimate Std Error z value Pr(|z|) l11 -0.245981 0.21863 -1.12510 0.26054748 X1 --- F1 l21 -0.308249 0.22573 -1.36555 0.17207875 X2 --- F1 l31 0.202590 0.079102.56118 0.01043175 X3 --- F1 l41 -0.235156 0.21980 -1.06985 0.28468885 X4 --- F1 l12 0.839985 0.219623.82476 0.00013090 X1 --- F2 l22 0.828460 0.225483.67418 0.00023862 X2 --- F2 l32 0.066722 0.083690.79725 0.42530606 X3 --- F2 l42 0.832037 0.218403.80963 0.00013917 X4 --- F2 g12 0.936719 0.643311.45609 0.14536647 F2 -- F1 g11 2.567669 1.256082.04418 0.04093528 F1 -- F1 g22 1.208497 0.550402.19567 0.02811527 F2 -- F2 Iterations = 59 And it produces the following path diagram: path.diagram(fit.sem) digraph fit.sem { rankdir=LR; size=8,8; node [fontname=Helvetica fontsize=14 shape=box]; edge [fontname=Helvetica fontsize=10]; center=1; F2 [shape=ellipse] F1 [shape=ellipse] F1 - X1 [label=l11]; F1 - X2 [label=l21]; F1 - X3 [label=l31]; F1 - X4 [label=l41]; F1 - X5 [label=]; F2 - X1 [label=l12]; F2 - X2 [label=l22]; F2 - X3 [label=l32]; F2 - X4 [label=l42]; F2 - X6 [label=]; } But I don't see the residual error terms that go into each of the observed variables X1 - X6. I've tried: model.sa - specify.model() E1 - X1, e1, 1 E2 - X2, e2, 1 E3 - X3, e3, 1 E4 - X4, e4, 1 E5 - X5, e5, 1 E6 - X6, e6, 1 E1 - E1, s1, NA E2 - E2, s2, NA E3 - E3, s3, NA E4 - E4, s4, NA E5 - E5, s5, NA E6 - E6, s6, NA F1 - X1,l11, NA F1 - X2,l21, NA F1 - X3,l31, NA
Re: [R] Specifying Path Model in SEM for CFA
On Wed, 2006-08-16 at 08:47 -0400, John Fox wrote: Dear Rick, There are a couple of problems here: (1) You've fixed the error variance parameters for each of the observed variables to 1 rather than defining each as a free parameter to estimate. For example, use X1 - X1, theta1, NA Rather than X1 - X1, NA, 1 The general principle is that if you give a parameter a name, it's a free parameter to be estimated; if you give the name as NA, then the parameter is given a fixed value (here, 1). (There is some more information on this and on error-variance parameters in ?sem.) (2) I believe that the model you're trying to specify -- in which all variables but X6 load on F1, and all variables but X1 load on F2 -- is underidentified. In addition, you've set the metric of the factors by fixing one loading to 0.20 and another to 0.25. That should work but strikes me as unusual, and makes me wonder whether this was what you really intended. It would be more common in a CFA to fix the variance of each factor to 1, and let the factor loadings be free parameters. Then the factor covariance would be their correlation. You should not have to specify start values for free parameters (such as g11, g22, and g12 in your model), though it is not wrong to do so. I would not, however, specify start values that imply a singular covariance matrix among the factors, as you've done; I'm surprised that the program was able to get by the start values to produce a solution. BTW, the Thurstone example in ?sem is for a confirmatory factor analysis (albeit a slightly more complicated one with a second-order factor). There's also an example of a one-factor CFA in the paper at http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf, though this is for ordinal observed variables. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox Thanks for the information. I think I understand how to handle the residual variance after reading the sem help file more carefully. Now I have to figure out how to constrain each column of the factor matrix to sum to one. Maybe this will fix the problem with being under-identified. Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specifying Path Model in SEM for CFA
Dear Rick, It's unclear to me what you mean by constraining each column of the factor matrix to sum to one. If you intend to constrain the loadings on each factor to sum to one, sem() won't do that, since it supports only equality constraints, not general linear constraints on parameters of the model, but why such a constraint would be reasonable in the first place escapes me. More common in confirmatory factor analysis would be to constrain more of the loadings to zero. Of course, one would do this only if it made substantive sense in the context of the research. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: Rick Bilonick [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 16, 2006 12:07 PM To: John Fox Cc: 'R Help' Subject: Re: [R] Specifying Path Model in SEM for CFA On Wed, 2006-08-16 at 08:47 -0400, John Fox wrote: Dear Rick, There are a couple of problems here: (1) You've fixed the error variance parameters for each of the observed variables to 1 rather than defining each as a free parameter to estimate. For example, use X1 - X1, theta1, NA Rather than X1 - X1, NA, 1 The general principle is that if you give a parameter a name, it's a free parameter to be estimated; if you give the name as NA, then the parameter is given a fixed value (here, 1). (There is some more information on this and on error-variance parameters in ?sem.) (2) I believe that the model you're trying to specify -- in which all variables but X6 load on F1, and all variables but X1 load on F2 -- is underidentified. In addition, you've set the metric of the factors by fixing one loading to 0.20 and another to 0.25. That should work but strikes me as unusual, and makes me wonder whether this was what you really intended. It would be more common in a CFA to fix the variance of each factor to 1, and let the factor loadings be free parameters. Then the factor covariance would be their correlation. You should not have to specify start values for free parameters (such as g11, g22, and g12 in your model), though it is not wrong to do so. I would not, however, specify start values that imply a singular covariance matrix among the factors, as you've done; I'm surprised that the program was able to get by the start values to produce a solution. BTW, the Thurstone example in ?sem is for a confirmatory factor analysis (albeit a slightly more complicated one with a second-order factor). There's also an example of a one-factor CFA in the paper at http://socserv.socsci.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf, though this is for ordinal observed variables. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox Thanks for the information. I think I understand how to handle the residual variance after reading the sem help file more carefully. Now I have to figure out how to constrain each column of the factor matrix to sum to one. Maybe this will fix the problem with being under-identified. Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specifying Path Model in SEM for CFA
I'm using specify.model for the sem package. I can't figure out how to represent the residual errors for the observed variables for a CFA model. (Once I get this working I need to add some further constraints.) Here is what I've tried: model.sa - specify.model() F1 - X1,l11, NA F1 - X2,l21, NA F1 - X3,l31, NA F1 - X4,l41, NA F1 - X5, NA, 0.20 F2 - X1,l12, NA F2 - X2,l22, NA F2 - X3,l32, NA F2 - X4,l42, NA F2 - X6, NA, 0.25 F1- F2,g12, 1 F1- F1,g11, 1 F2- F2,g22, 1 X1- X1, NA, 1 X2- X2, NA, 1 X3- X3, NA, 1 X4- X4, NA, 1 X5- X5, NA, 1 X6- X6, NA, 1 This at least converges: summary(fit.sem) Model Chisquare = 2147 Df = 10 Pr(Chisq) = 0 Chisquare (null model) = 2934 Df = 15 Goodness-of-fit index = 0.4822 Adjusted goodness-of-fit index = -0.087387 RMSEA index = 0.66107 90 % CI: (NA, NA) Bentler-Bonnett NFI = 0.26823 Tucker-Lewis NNFI = -0.098156 Bentler CFI = 0.26790 BIC = 2085.1 Normalized Residuals Min. 1st Qu. MedianMean 3rd Qu.Max. -5.990 -0.618 0.192 0.165 1.700 3.950 Parameter Estimates Estimate Std Error z value Pr(|z|) l11 -0.245981 0.21863 -1.12510 0.26054748 X1 --- F1 l21 -0.308249 0.22573 -1.36555 0.17207875 X2 --- F1 l31 0.202590 0.079102.56118 0.01043175 X3 --- F1 l41 -0.235156 0.21980 -1.06985 0.28468885 X4 --- F1 l12 0.839985 0.219623.82476 0.00013090 X1 --- F2 l22 0.828460 0.225483.67418 0.00023862 X2 --- F2 l32 0.066722 0.083690.79725 0.42530606 X3 --- F2 l42 0.832037 0.218403.80963 0.00013917 X4 --- F2 g12 0.936719 0.643311.45609 0.14536647 F2 -- F1 g11 2.567669 1.256082.04418 0.04093528 F1 -- F1 g22 1.208497 0.550402.19567 0.02811527 F2 -- F2 Iterations = 59 And it produces the following path diagram: path.diagram(fit.sem) digraph fit.sem { rankdir=LR; size=8,8; node [fontname=Helvetica fontsize=14 shape=box]; edge [fontname=Helvetica fontsize=10]; center=1; F2 [shape=ellipse] F1 [shape=ellipse] F1 - X1 [label=l11]; F1 - X2 [label=l21]; F1 - X3 [label=l31]; F1 - X4 [label=l41]; F1 - X5 [label=]; F2 - X1 [label=l12]; F2 - X2 [label=l22]; F2 - X3 [label=l32]; F2 - X4 [label=l42]; F2 - X6 [label=]; } But I don't see the residual error terms that go into each of the observed variables X1 - X6. I've tried: model.sa - specify.model() E1 - X1, e1, 1 E2 - X2, e2, 1 E3 - X3, e3, 1 E4 - X4, e4, 1 E5 - X5, e5, 1 E6 - X6, e6, 1 E1- E1, s1, NA E2- E2, s2, NA E3- E3, s3, NA E4- E4, s4, NA E5- E5, s5, NA E6- E6, s6, NA F1 - X1,l11, NA F1 - X2,l21, NA F1 - X3,l31, NA F1 - X4,l41, NA F1 - X5, NA, 1 F2 - X1,l12, NA F2 - X2,l22, NA F2 - X3,l32, NA F2 - X4,l42, NA F2 - X6, NA, 1 F1- F2, NA, 1 F1- F1, NA, 1 F2- F2,g22, NA X1- X1, NA, 1 X2- X2, NA, 1 X3- X3, NA, 1 X4- X4, NA, 1 X5- X5, NA, 1 X6- X6, NA, 1 I'm trying to use E1 - E6 as the residual error terms. But I get warning messages about no variances for X1-X6 and it doesn't converge. Also, the associated path diagram: digraph fit.sem { rankdir=LR; size=8,8; node [fontname=Helvetica fontsize=14 shape=box]; edge [fontname=Helvetica fontsize=10]; center=1; E1 [shape=ellipse] E2 [shape=ellipse] E3 [shape=ellipse] E4 [shape=ellipse] E5 [shape=ellipse] E6 [shape=ellipse] F2 [shape=ellipse] F1 [shape=ellipse] E1 - X1 [label=]; E2 - X2 [label=]; E3 - X3 [label=]; E4 - X4 [label=]; E5 - X5 [label=]; E6 - X6 [label=]; F1 - X1 [label=l11]; F1 - X2 [label=l21]; F1 - X3 [label=l31]; F1 - X4 [label=l41]; F1 - X5 [label=]; F2 - X1 [label=l12]; F2 - X2 [label=l22]; F2 - X3 [label=l32]; F2 - X4 [label=l42]; F2 - X6 [label=]; } Has ellipses around the E1-E6 which I believe indicates they are latent factors and not residual errors. If anyone could point in the right direction I would appreciate it. Rick B. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.