[spctools-discuss] Re: PeptideProphet for X!Tandem

Brendan Thu, 11 Mar 2010 08:44:41 -0800

Hi Amit,
Good questions.  First, X! Tandem itself corrects for the case with
low numbers of matching precursor m/z values by rotating amino acid
positions in the matching sequences to create new random sequences for
its expectation value.


Next, I am the one who wrote the X! Tandem f-value for X! Tandem
native scoring, and of course the k-score f-value is based on the
original Keller OMICS paper.

On this issue, it is important to remember that the f-value is an
input to PeptideProphet, which can base its model on other useful
discriminative information like:
- Number of termini with expected cleavage (if you use semi-cleavage
or unconstrained cleavage in your search)
- Mass accuracy for data collected on high mass accuracy instruments
- Matches to decoy sequences added to your FASTA file
- Hydrophobicity score correlation with retention time

All of these take extra planning in executing your search, but they
are quite effective in increasing search effectiveness.

http://www.ncbi.nlm.nih.gov/pubmed/19938873

It turns out that existing scoring engines are not as discriminative
as we might expect, leaving plenty of room for secondary tools like
PeptideProphet and Percolator to add value.  From all of this
information, PeptideProphet generates a local error probability giving
you the probability that any single match is actually correct.  That
means 1 out of 10 matches with 0.9 PeptideProphet score will be a
false positive.  And it does this amazingly effectively for most of
the cases I have looked at.

To do its job, however, PeptideProphet relies on the search engine
generating two fairly normal distributions, one for true-positive
matches and another for false-positive matches.  K-score generates
these distributions very well with its single score.  X! Tandem native
score's expect value, while certainly a better discriminative value
than the native hyper score alone, has distribution problems, and
doesn't correct for the scoring engine's tendency to give higher
scores to longer peptides.  The f-value I wrote in PeptideProphet
attempts to both produce a more normal distribution of true-positive
scores, and correct for shortcomings in the X! Tandem native score.

It applies a set of charge-based weights to the function:

disc = score_wt_ * log((double)tresult->hyper_) + expect_wt_ * (0-
log((double)tresult->expect_)) + delta_wt_ * (1.0 - (tresult->next_ /
tresult->hyper_));
disc /= len_wt_ * sqrt((double)strlen(tresult->peptide_));

So, you can see it is making use of the raw hyper score, the expect
value, the distance of the highest score from the next highest score
and the length of the peptide matched.  Works much better than relying
on the expect value alone, even without using the extra discriminative
values I mention above.

But, I will also say that all my testing has shown that k-score is the
better score to use.  It has a different f-value, and PeptideProphet
works very well with it.

Hope that helps.

--Brendan

On Mar 10, 11:27 pm, Amit Yadav <[email protected]> wrote:
> Hi Natalie,
>
> Thanks for the info but my question is still not answered completely.
>
> My question was "*HOW*" TPP does what it does. We all know it extracts more
> info from results but I wanted to know HOW does it calculate the
> statistically valid p-value (chance of match being incorrect given the null
> hypothesis) if the search engine did i already?
>
> If u know the database size, u can always calculate the p-value from e-value
> and vice-versa. So, how does TPP fit in?
>
> Lets focus on another very potent problem not much discussed in search
> engines - In higher mass ranges there are very few candidates to accurately
> calculate a p-value or e-value (or any statistical measure for that matter).
> Unless u have a null model (a supposedly bell curve for random hits), u
> cannot calculate a statistical confidence. And u dont have enough candidates
> (lets assume just 5 candidates) to draw a distribution for Tandem to
> calculate the e-value correctly.
> Curve fitting will have drastic effect here.
>
> How does TPP correct for it, if it does so?
>
> Regards,
>
> Amit Kumar Yadav
> Senior Research Fellow (SRF-CSIR)
> IGIB, New Delhi (India)
>
> http://masswiz.igib.res.in
>
> On Thu, Mar 11, 2010 at 11:55 AM, Natalie Tasman <
>
> [email protected]> wrote:
> > Good questions and worth explaining for those new to the field.  X!Tandem
> > is a program which to assign peptide sequences ("ID"s) to ms/ms spectra.  We
> > call this type of program a "search engine" (for "peptide ID search engine"
> > or similar).  Other programs in this class are OMSSA, Sequest, Mascot, and
> > others.  Each of these programs can be run on its own, and outputs a score
> > for each "assigned" peptide.  This score reflects the search engine's
> > estimation of how likely or confident that assignment is (I use those terms
> > not necessarily as true stats meanings here.)
>
> > So now you have input of peptide IDs and "scores".  The TPP tools (Pep and
> > Prot Prophet) do two major things of interest.  One, they take the
> > variously-derrived scores, compute additional information not necessarily
> > accounted by the search engine (asking such questions as "how reasonable is
> > this sequence, given its terminii, length, hydrophobicity, and so on), and
> > combine these numbers to arrive at a statistically valid p value.  Two, the
> > TPP tools do this for many supported search engines, which allows the
> > possibility of comparing the peptide assignments to other results, i.e. in a
> > publication context.  (For a more complex approach to the last point, look
> > at the TPP's InterProphet tool.)
>
> > Regarding FDR, I will leave that to someone else to answer in detail.
>
> > Hope this helps,
> > Natalie
>
> > On 3/10/10 8:46 PM, Amit Kumar Yadav wrote:
>
> >> Dear All,
>
> >> I was just browsing the gpm site and reading about tandem. It says
> >> that peptideprophet and proteinprophet need not be used with X!Tandem.
> >> Can someone explain (in simple terms please) how TPP uses tandem
> >> result data to assign probabilities?
>
> >> Another naive question is about calculation of FDR from X!Tandem
> >> results? How should it be done?
>
> >> For reference-
> >> Quoting fromhttp://www.thegpm.org/TANDEM/index.html"Unlike
> >> some... ... Therefore, separate assembly and statistical analysis
> >> software, e.g. PeptideProphet and ProteinProphet, do not need to be
> >> used."
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "spctools-discuss" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]<spctools-discuss%[email protected]>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/spctools-discuss?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

[spctools-discuss] Re: PeptideProphet for X!Tandem

Reply via email to