Dear TPP developers and community,

I wanted to point out a potential error in how StPeter estimates protein
mass (nanograms) in the proteome sample. As described in the paper
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5891225/>, the program
normalizes the spectral index, dSI, by the protein length, L, and the total
spectral index from the sample, Sum(dSI), as in the formula below:
[image: image.png]
This is correct for estimating the relative copy number, or *mole fraction*
(the fraction of the total number of protein molecules), of each protein.
However, for nanograms, or *mass fraction* (the fraction of the total
proteome mass), the normalization by L should be omitted.

I hope this makes sense. The mass abundance of each protein is proportional
to *both* its length and its copy number, therefore, normalization by
length should not be performed for mass abundance estimation.

Unfortunately, as the StPeter paper says (and as I have verified in the
output), for calculating the nanograms "each protein SIN is divided by the
sum of all proteins’ SIN and multiplied by the protein load in nanograms".
This is effectively using mole fraction in place of mass fraction, which is
incorrect.

The authors (and other users) may not have noticed this error because it is
inconsequential for tracking changes between different samples/conditions.
However, it would be significant for consistency with other mass
quantitation methods.

To check the consistency, when StPeter's SIN output is correctly used to
estimate mass fractions, i.e.  dSIN * L / Sum(dSIN * L) is calculated
instead of the above formula, the result is highly correlated with that of
spectral counting, as expected, and as you can see in an example below:

[image: StPeter_PSMs_aer.png]


The method of mass fraction estimation using spectral counting is already
established in the literature, for example in this paper
<https://www.embopress.org/doi/full/10.15252/msb.20145697> see the
"Absolute protein quantitation" section: "The absolute abundance of a
protein was calculated by dividing the total number of spectra of all
peptides for that protein by the total number of 14N spectra in the
sample." No normalization by protein length is done, because length has to
be included in the *mass *abundance of a protein.

The paper also verifies the consistency of this method with 15N-labeled
relative quantitation (see their supplementary figure S9). I have also
verified the agreement in my own relative quantitation experiments.

I would be interested in learning your thoughts on this. For obtaining
protein mass abundances (or mass fractions), StPeter's "SIn" output (which
is log2[dSIN]) is currently usable in the way described above, but the "ng"
output needs to be corrected in the source code. Optionally, the current
"ng" calculation can also be re-labeled as "copy numbers" given a total
load of copy numbers (instead of total nanograms) provided by the user, but
that would probably be of less interest than nanograms.

Please let me know what you think.

Thank you,
Farshad

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/spctools-discuss/CAFyEx3x%2BeKx6%2BUZeXvzf8WN7x98wHkwrk1Y9Lm2HSprVZ52HJQ%40mail.gmail.com.

Reply via email to