Re: [Ifeffit] How to calculate F-value for XANES PCA results
Hi Andrew, I don't completely understand what your question is, but I will try to answer you. First, Sam Webb should be able to answer exactly how F was calculated in Six-Pack. However, I can tell you how F is calculated in general. First of all, the number in the Excel spread-sheet is probably not F, it is the probability of F. In general, if the probability of F is greater than 5%, that component is part of the noise. This is the same as saying that that component is within 2-sigma of the noise. F is easy to calculate for least-squares fitting, and is nicely explained by wikipedia in the regression problems section: http://en.wikipedia.org/wiki/F-test where RSS is chi-square and n is the number of independent data points, which is the lesser of the number of data points or the spectral range divided buy the resolution. In your example, you have 18 independent data points (36 eV range divided by a 2 eV resolution). You then need to calculate the probability of that value of F given the number of parameters and number of independent data points, which is not explained by the wiki article but can easily be done in Excel. In your example, you have two components with probability of F less than 5%, these would be the components that you would retain. Sincerely, Wayne -- Wayne Lukens Staff Scientist Lawrence Berkeley National Laboratory email: wwluk...@lbl.gov phone: (510) 486-4305 FAX: (510) 486-5596 Andrew wrote: Hi everyone, I was looking through the literature on how to handle PC analysis data and saw that there are several different methods you can use for determining how many components there are in the series of scans. Included in SixPack are the indicator function, scree test, and the ability to quickly do the reduced eigenvalue ratios. I’ve been digging through the literature as to how to calculate the F-values. The closest to an answer that I got was: “The above-mentioned reduction of the body of experimental data, that is, the decision of what components correspond to the noise and what are the principal components, is now made on the basis of an F test of the variance associated with eigenvalue k and the summed variance associated with noise eigenvalues (k+1, ..., c). The null hypothesis is that a given factor k*/ /*is a member of the pool of noise factors. The probability that an F*/ /*value would be higher than the current value is given by %SL (percentage of significance level). Thus, the kth*/ /*factor is accepted as a principal component if %SL is lower than some test level.” (Garcia 1995). We ran the PCA on the reduction of iron while scanning at increased temperatures. I checked the foil standard but did not see any shift in the max at 7112, we scanned at 0.5 eV intervals (2 eV resolution at the beam). I thought I understood what that statement was saying but I’m almost certain I’m doing something wrong. I have attached the .xlsx file that I was working on and hope someone can point me to the right direction. The file includes the components of the PCA and some of the variances that I was calculating. If there is a paper that someone shows an actual calculation of this in the supplemental materials that would have been exactly what I was looking for! Thanks for the help! Andrew Campos Fernandez-Garcia, M., C. Marquez Alvarez, and G.L. Haller, The Journal of Physical Chemistry, 1995, *99*(33), 12565-12569. ___ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit ___ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Re: [Ifeffit] How to calculate F-value for XANES PCA results
Hi everyone, Thank you Dr. Lukens for your help! Let me see if I understand the method that was described for the F-value for the variances using the Fernandez-Garcia definition (that was previously mentioned), and please correct me if I am mistaken. The Principal Component Analysis returns the eigenvectors. Then, to calculate the F-value using the Fernandez-Garcia definition: F-value for component 1 = (variance of eigenvector 1)/ summation[(variance eigenvector 2) + (variance eigenvector 3) + . (variance eigenvector c)] F-value for component 2 = (variance of eigenvector 2)/ summation[(variance eigenvector 3) + (variance eigenvector 4) + . (variance eigenvector c)] F-value for component k = (variance of eigenvector k)/ summation[(variance of eigenvector k+1) + . + (variance of eigenvector c)] Where c is the number of components in the set. Then to calculate the probability of F corresponds to noise, then the that Excel can calculate this using the function Fdist(alpha, degree of freedom 1, degree of freedom 2). Alpha = the confidence interval desired (where 0.05 is generally used) degrees of freedom 1 = # of independent data points - 1 ((this is dependent on the resolution of the beam and Dr. Lukens provided an example calc.)) degree of freedom 2 = number of components on the denominator for the F-value being tested - 1 (i.e. for component k it would equal c-k-1-1 or c-k-2) Then, if the probability of F less than 5%, these would be the components that you would retain. Are these equations correct? Am I using the correct equation based on the Garcia-Fernandez definition? My main misunderstanding of this was what equation to use for the F-value. Sorry for killing a dead horse, but is this definition of degree of freedom 2 correct? Thanks again for all the help and sorry if this is poorly worded, and if this is on the outer-bounds for an IFEFFIT-relevant question. Andrew ___ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit
Re: [Ifeffit] How to calculate F-value for XANES PCA results
Hi Andrew, It is easier to use Fdist to calculate the probability that a given value of F corresponds is within the noise. The way to do this is to use FDIST(F, (v1-v2), v1), where F, v1, and v2 are explained below. If you have two models, model 1 and model 2, where model 1 has an additional component versus model 2, and the chi-squared, number of parameters, and degrees of freedom for model 1 are X1, p1, and v1, where v1=n-p1 and n is the number of independent parameters, and model 2 has similarly defined X2, p2, and v2, then F= [(X2-X1)*v2]/[(v1-v2)*X1] the probability of F is FDIST(F, (v1-v2), v1). I assume the same is true for the PCA analysis; however, I don't know what the values for v and p are in this case. The formula for F looks right. At any rate, once you have done the PCA analysis, you need to figure out which standard spectra span the components (there should be the same number of standards as components). Then you need to fit your experimental spectra using the standards. At that point you can apply the F-test to determine whether the contribution from that standard is greater than 2 sigma over the noise. I hope that makes sense? Sincerely, Wayne Andrew wrote: Hi everyone, Thank you Dr. Lukens for your help! Let me see if I understand the method that was described for the F-value for the variances using the Fernandez-Garcia definition (that was previously mentioned), and please correct me if I am mistaken. The Principal Component Analysis returns the eigenvectors. Then, to calculate the F-value using the Fernandez-Garcia definition: F-value for component 1 = (variance of eigenvector 1)/ summation[(variance eigenvector 2) + (variance eigenvector 3) + … (variance eigenvector c)] F-value for component 2 = (variance of eigenvector 2)/ summation[(variance eigenvector 3) + (variance eigenvector 4) + … (variance eigenvector c)] F-value for component k = (variance of eigenvector k)/ summation[(variance of eigenvector k+1) + … + (variance of eigenvector c)] Where c is the number of components in the set. Then to calculate the probability of F corresponds to noise, then the that Excel can calculate this using the function Fdist(alpha, degree of freedom 1, degree of freedom 2). Alpha = the confidence interval desired (where 0.05 is generally used) degrees of freedom 1 = # of independent data points – 1 ((this is dependent on the resolution of the beam and Dr. Lukens provided an example calc.)) degree of freedom 2 = number of components on the denominator for the F-value being tested – 1 (i.e. for component k it would equal c-k-1-1 or c-k-2) Then, “if the probability of F less than 5%, these would be the components that you would retain.” Are these equations correct? Am I using the correct equation based on the Garcia-Fernandez definition? My main misunderstanding of this was what equation to use for the F-value. Sorry for killing a dead horse, but is this definition of degree of freedom 2 correct? Thanks again for all the help and sorry if this is poorly worded, and if this is on the outer-bounds for an IFEFFIT-relevant question. Andrew ___ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit -- Wayne Lukens Staff Scientist Lawrence Berkeley National Laboratory email: wwluk...@lbl.gov phone: (510) 486-4305 FAX: (510) 486-5596 ___ Ifeffit mailing list Ifeffit@millenia.cars.aps.anl.gov http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit