Re: [Ifeffit] How to calculate F-value for XANES PCA results

2010-01-11 Thread Wayne Lukens

Hi Andrew,

I don't completely understand what your question is, but I will try to
answer you.  First, Sam Webb should be able to answer exactly how F was
calculated in Six-Pack. However, I can tell you how F is calculated in
general. First of all, the number in the Excel spread-sheet is probably
not F, it is the probability of F. In general, if the probability of F
is greater than 5%, that component is part of the noise. This is the
same as saying that that component is within 2-sigma of the noise.

F is easy to calculate for least-squares fitting, and is nicely
explained by wikipedia in the regression problems section:

http://en.wikipedia.org/wiki/F-test

where RSS is chi-square and n is the number of independent data points,
which is the lesser of the number of data points or the spectral range
divided buy the resolution. In your example, you have 18 independent
data points (36 eV range divided by a 2 eV resolution).

You then need to calculate the probability of that value of F given the
number of parameters and number of independent data points, which is not
explained by the wiki article but can easily be done in Excel.

In your example, you have two components with probability of F less than
5%, these would be the components that you would retain.

Sincerely,

Wayne
--
Wayne Lukens
Staff Scientist
Lawrence Berkeley National Laboratory
email: wwluk...@lbl.gov
phone: (510) 486-4305
FAX: (510) 486-5596


Andrew wrote:



Hi everyone,

 

I was looking through the literature on how to handle PC analysis data 
and saw that there are several different methods you can use for 
determining how many components there are in the series of scans. 
Included in SixPack are the indicator function, scree test, and the 
ability to quickly do the reduced eigenvalue ratios. I’ve been digging 
through the literature as to how to calculate the F-values. The closest 
to an answer that I got was:


 

“The above-mentioned reduction of the body of experimental data, that 
is, the decision of what components correspond to the noise and what are 
the principal components, is now made on the basis of an F test of the 
variance associated with eigenvalue k and the summed variance associated 
with noise eigenvalues (k+1, ..., c). The null hypothesis is that a 
given factor k*/ /*is a member of the pool of noise factors. The 
probability that an F*/ /*value would be higher than the current value 
is given by %SL (percentage of significance level). Thus, the kth*/ 
/*factor is accepted as a principal component if %SL is lower than some 
test level.” (Garcia 1995).


 

We ran the PCA on the reduction of iron while scanning at increased 
temperatures. I checked the foil standard but did not see any shift in 
the max at 7112, we scanned at 0.5 eV intervals (2 eV resolution at the 
beam). I thought I understood what that statement was saying but I’m 
almost certain I’m doing something wrong. I have attached the .xlsx file 
that I was working on and hope someone can point me to the right 
direction. The file includes the components of the PCA and some of the 
variances that I was calculating. If there is a paper that someone shows 
an actual calculation of this in the supplemental materials that would 
have been exactly what I was looking for!


 


Thanks for the help!

Andrew Campos

 

Fernandez-Garcia, M., C. Marquez Alvarez, and G.L. Haller, The Journal 
of Physical Chemistry, 1995, *99*(33), 12565-12569.





___
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit



___
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit


Re: [Ifeffit] How to calculate F-value for XANES PCA results

2010-01-11 Thread Andrew
Hi everyone,

 

Thank you Dr. Lukens for your help! Let me see if I understand the method
that was described for the F-value for the variances using the
Fernandez-Garcia definition (that was previously mentioned), and please
correct me if I am mistaken. 

 

The Principal Component Analysis returns the eigenvectors. Then, to
calculate the F-value using the Fernandez-Garcia definition:

 

F-value for component 1 = (variance of eigenvector 1)/ summation[(variance
eigenvector 2) + (variance eigenvector 3) + . (variance eigenvector c)]

F-value for component 2 = (variance of eigenvector 2)/ summation[(variance
eigenvector 3) + (variance eigenvector 4) + . (variance eigenvector c)]

F-value for component k  = (variance of eigenvector k)/ summation[(variance
of eigenvector k+1) + . + (variance of eigenvector c)]

 

Where c is the number of components in the set.

 

Then to calculate the probability of F corresponds to noise, then the that
Excel can calculate this using the function Fdist(alpha, degree of freedom
1, degree of freedom 2).

 

Alpha = the confidence interval desired (where 0.05 is generally used)
degrees of freedom 1 = # of independent data points - 1 ((this is dependent
on the resolution of the beam and Dr. Lukens provided an example calc.)) 
degree of freedom 2 = number of components on the denominator for the
F-value being tested - 1 (i.e. for component k it would equal c-k-1-1 or
c-k-2)
 
Then, if the probability of F less than 5%, these would be the components
that you would retain.
 

Are these equations correct? Am I using the correct equation based on the
Garcia-Fernandez definition? My main misunderstanding of this was what
equation to use for the F-value. Sorry for killing a dead horse, but is this
definition of degree of freedom 2 correct?

 

Thanks again for all the help and sorry if this is poorly worded, and if
this is on the outer-bounds for an IFEFFIT-relevant question.

Andrew

___
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit


Re: [Ifeffit] How to calculate F-value for XANES PCA results

2010-01-11 Thread Wayne Lukens

Hi Andrew,

It is easier to use Fdist to calculate the probability that a given
value of F corresponds is within the noise.  The way to do this is
to use FDIST(F, (v1-v2), v1), where F, v1, and v2 are explained below.

If you have two models, model 1 and model 2, where model 1 has an
additional component versus model 2, and the chi-squared, number
of parameters, and degrees of freedom for model 1 are X1, p1, and
v1, where v1=n-p1 and n is the number of independent parameters, and
model 2 has similarly defined X2, p2, and v2, then



F= [(X2-X1)*v2]/[(v1-v2)*X1]

the probability of F is FDIST(F, (v1-v2), v1). I assume the same
is true for the PCA analysis; however, I don't know what the values
for v and p are in this case. The formula for F looks right.

At any rate, once you have done the PCA analysis, you need to figure
out which standard spectra span the components (there should be the
same number of standards as components). Then you need to fit your
experimental spectra using the standards. At that point you can
apply the F-test to determine whether the contribution from that
standard is greater than 2 sigma over the noise. I hope that makes
sense?

Sincerely,

Wayne



Andrew wrote:



Hi everyone,

 

Thank you Dr. Lukens for your help! Let me see if I understand the 
method that was described for the F-value for the variances using the 
Fernandez-Garcia definition (that was previously mentioned), and please 
correct me if I am mistaken.


 

The Principal Component Analysis returns the eigenvectors. Then, to 
calculate the F-value using the Fernandez-Garcia definition:


 

F-value for component 1 = (variance of eigenvector 1)/ 
summation[(variance eigenvector 2) + (variance eigenvector 3) + … 
(variance eigenvector c)]


F-value for component 2 = (variance of eigenvector 2)/ 
summation[(variance eigenvector 3) + (variance eigenvector 4) + … 
(variance eigenvector c)]


F-value for component k  = (variance of eigenvector k)/ 
summation[(variance of eigenvector k+1) + … + (variance of eigenvector c)]


 


Where c is the number of components in the set.

 

Then to calculate the probability of F corresponds to noise, then the 
that Excel can calculate this using the function Fdist(alpha, degree of 
freedom 1, degree of freedom 2).


 

Alpha = the confidence interval desired (where 0.05 is generally used)  

degrees of freedom 1 = # of independent data points – 1 ((this is dependent on the resolution of the beam and Dr. Lukens provided an example calc.))   

degree of freedom 2 = number of components on the denominator for the F-value being tested – 1 (i.e. for component k it would equal c-k-1-1 or c-k-2)  

   

Then, “if the probability of F less than 5%, these would be the components that you would retain.”  

   

Are these equations correct? Am I using the correct equation based on 
the Garcia-Fernandez definition? My main misunderstanding of this was 
what equation to use for the F-value. Sorry for killing a dead horse, 
but is this definition of degree of freedom 2 correct?


 

Thanks again for all the help and sorry if this is poorly worded, and if 
this is on the outer-bounds for an IFEFFIT-relevant question.


Andrew




___
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit



--
Wayne Lukens
Staff Scientist
Lawrence Berkeley National Laboratory
email: wwluk...@lbl.gov
phone: (510) 486-4305
FAX: (510) 486-5596
___
Ifeffit mailing list
Ifeffit@millenia.cars.aps.anl.gov
http://millenia.cars.aps.anl.gov/mailman/listinfo/ifeffit