I've been trying to figure out how the deltaMass, number of tryptic termini, and number of missed cleavages are used to adjust the discriminate score after the the calculation that you show in your message. Can you tell me where in the code this takes place?
Thanks, Bill On Mar 11, 11:44 am, Brendan <[email protected]> wrote: > Hi Amit, > Good questions. First, X! Tandem itself corrects for the case with > low numbers of matching precursor m/z values by rotating amino acid > positions in the matching sequences to create new random sequences for > its expectation value. > > Next, I am the one who wrote the X! Tandem f-value for X! Tandem > native scoring, and of course the k-score f-value is based on the > original Keller OMICS paper. > > On this issue, it is important to remember that the f-value is an > input to PeptideProphet, which can base its model on other useful > discriminative information like: > - Number of termini with expected cleavage (if you use semi-cleavage > or unconstrained cleavage in your search) > - Mass accuracy for data collected on high mass accuracy instruments > - Matches to decoy sequences added to your FASTA file > - Hydrophobicity score correlation with retention time > > All of these take extra planning in executing your search, but they > are quite effective in increasing search effectiveness. > > http://www.ncbi.nlm.nih.gov/pubmed/19938873 > > It turns out that existing scoring engines are not as discriminative > as we might expect, leaving plenty of room for secondary tools like > PeptideProphet and Percolator to add value. From all of this > information, PeptideProphet generates a local error probability giving > you the probability that any single match is actually correct. That > means 1 out of 10 matches with 0.9 PeptideProphet score will be a > false positive. And it does this amazingly effectively for most of > the cases I have looked at. > > To do its job, however, PeptideProphet relies on the search engine > generating two fairly normal distributions, one for true-positive > matches and another for false-positive matches. K-score generates > these distributions very well with its single score. X! Tandem native > score's expect value, while certainly a better discriminative value > than the native hyper score alone, has distribution problems, and > doesn't correct for the scoring engine's tendency to give higher > scores to longer peptides. The f-value I wrote in PeptideProphet > attempts to both produce a more normal distribution of true-positive > scores, and correct for shortcomings in the X! Tandem native score. > > It applies a set of charge-based weights to the function: > > disc = score_wt_ * log((double)tresult->hyper_) + expect_wt_ * (0- > log((double)tresult->expect_)) + delta_wt_ * (1.0 - (tresult->next_ / > tresult->hyper_)); > disc /= len_wt_ * sqrt((double)strlen(tresult->peptide_)); > > So, you can see it is making use of the raw hyper score, the expect > value, the distance of the highest score from the next highest score > and the length of the peptide matched. Works much better than relying > on the expect value alone, even without using the extra discriminative > values I mention above. > > But, I will also say that all my testing has shown that k-score is the > better score to use. It has a different f-value, and PeptideProphet > works very well with it. > > Hope that helps. > > --Brendan > > On Mar 10, 11:27 pm, Amit Yadav <[email protected]> wrote: > > > > > Hi Natalie, > > > Thanks for the info but my question is still not answered completely. > > > My question was "*HOW*" TPP does what it does. We all know it extracts more > > info from results but I wanted to know HOW does it calculate the > > statistically valid p-value (chance of match being incorrect given the null > > hypothesis) if the search engine did i already? > > > If u know the database size, u can always calculate the p-value from e-value > > and vice-versa. So, how does TPP fit in? > > > Lets focus on another very potent problem not much discussed in search > > engines - In higher mass ranges there are very few candidates to accurately > > calculate a p-value or e-value (or any statistical measure for that matter). > > Unless u have a null model (a supposedly bell curve for random hits), u > > cannot calculate a statistical confidence. And u dont have enough candidates > > (lets assume just 5 candidates) to draw a distribution for Tandem to > > calculate the e-value correctly. > > Curve fitting will have drastic effect here. > > > How does TPP correct for it, if it does so? > > > Regards, > > > Amit Kumar Yadav > > Senior Research Fellow (SRF-CSIR) > > IGIB, New Delhi (India) > > >http://masswiz.igib.res.in > > > On Thu, Mar 11, 2010 at 11:55 AM, Natalie Tasman < > > > [email protected]> wrote: > > > Good questions and worth explaining for those new to the field. X!Tandem > > > is a program which to assign peptide sequences ("ID"s) to ms/ms spectra. > > > We > > > call this type of program a "search engine" (for "peptide ID search > > > engine" > > > or similar). Other programs in this class are OMSSA, Sequest, Mascot, and > > > others. Each of these programs can be run on its own, and outputs a score > > > for each "assigned" peptide. This score reflects the search engine's > > > estimation of how likely or confident that assignment is (I use those > > > terms > > > not necessarily as true stats meanings here.) > > > > So now you have input of peptide IDs and "scores". The TPP tools (Pep and > > > Prot Prophet) do two major things of interest. One, they take the > > > variously-derrived scores, compute additional information not necessarily > > > accounted by the search engine (asking such questions as "how reasonable > > > is > > > this sequence, given its terminii, length, hydrophobicity, and so on), and > > > combine these numbers to arrive at a statistically valid p value. Two, > > > the > > > TPP tools do this for many supported search engines, which allows the > > > possibility of comparing the peptide assignments to other results, i.e. > > > in a > > > publication context. (For a more complex approach to the last point, look > > > at the TPP's InterProphet tool.) > > > > Regarding FDR, I will leave that to someone else to answer in detail. > > > > Hope this helps, > > > Natalie > > > > On 3/10/10 8:46 PM, Amit Kumar Yadav wrote: > > > >> Dear All, > > > >> I was just browsing the gpm site and reading about tandem. It says > > >> that peptideprophet and proteinprophet need not be used with X!Tandem. > > >> Can someone explain (in simple terms please) how TPP uses tandem > > >> result data to assign probabilities? > > > >> Another naive question is about calculation of FDR from X!Tandem > > >> results? How should it be done? > > > >> For reference- > > >> Quoting fromhttp://www.thegpm.org/TANDEM/index.html"Unlike > > >> some... ... Therefore, separate assembly and statistical analysis > > >> software, e.g. PeptideProphet and ProteinProphet, do not need to be > > >> used." > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "spctools-discuss" group. > > > To post to this group, send email to [email protected]. > > > To unsubscribe from this group, send email to > > > [email protected]<spctools-discuss%2Bunsubscrib > > > [email protected]> > > > . > > > For more options, visit this group at > > >http://groups.google.com/group/spctools-discuss?hl=en. -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
