RE: [spctools-discuss] NSP model in iProphet/ProteinProphet; model vs decoy based FDR in ProteinProphet

Eric Deutsch Fri, 07 Feb 2014 14:01:03 -0800

Maybe "decoy fraction" is the right term for this concept?





*From:* [email protected] [mailto:
[email protected]] *On Behalf Of *Dave Trudgian
*Sent:* Thursday, February 06, 2014 2:20 PM
*To:* [email protected]
*Subject:* Re: [spctools-discuss] NSP model in iProphet/ProteinProphet;
model vs decoy based FDR in ProteinProphet



David,



Thanks for the pointer to the iProphet paper - very useful. I'd just been
thinking over a coffee about r=1/3 if ProphetModels could ignore the first
decoy set. Disabling DECOYPROBS on the DECOY1 set hadn't come into my head.
I'd worried in the past about the degeneracy issue, but have just ignored
it so far.



I have been working off the decoy probs downstream to report estimated FDRs
both at model fitting (DECOY1) and on the independent set (DECOY2), with
the latter used for filtering, and the former just as info for the curious.
I guess I can disable DECOYPROBS and just compute FDR on the independent
set, or modify ProphetModels.pl so it can ignore specified (DECOY1)
sequences in its computations. That way the ProphetModels.pl output is
going to be consistent with the downstream stuff.



I guess the only thing I'm left wondering is whether the ProphetModels.pl
help statement might confusing to others as well? I've always considered a
'ratio' to generally between two distinct sets, i.e. target:decoy rather
than a subset vs total. Maybe it could be explicitly stated?



-r <NUM>  -- Specify decoy ratio (decoy/total sequences). Will guess from
P<0.001 hits if not specified.



Thanks again.



Dave T





On Thursday, February 6, 2014 3:39:20 PM UTC-6, David Shteynberg wrote:

Hi Dave,



r is computed as Decoy / Total with less than 2% probability.  There is a
detailed discussion of this in the iProphet paper.





If you have a DB of 50% target 50% decoy and none of the decoys are
discarded (which is one way to use your 50%T 25%D1 25%D2) then r = 0.5



If you discard half of the decoys e.g. D1 is used for modelling and
DECOYPROBS is disabled (in which case all D1 get probability 0) and all D1
should be excluded from the analysis by ProphetModels.pl .  Then the
remaining decoys D2 will constitute roughly 1/3 of the remaining database
entries and r will be roughly one third ( 25/75 = 0.3333) .  In fact, r is
related not only to the protein counts but to the distinct peptides in each
set of the Database entries, and as the original database and the decoys
may have degenerate (repeated) peptides,  that's why it will be only
roughly that percentage and vary depending on the database, how the decoys
are constructed and how indepent are D1's decoys from D2 decoys.



The iProphet paper carries more info on this than I can put in an email, so
that's a good reference for this.



Cheers,

-David













On Thu, Feb 6, 2014 at 12:17 PM, Dave Trudgian <
[email protected] <javascript:>> wrote:

David,



I just saw Rene's note about the -r 0.25 decoy ratio. I'm similarly using 2
decoy sets (50% target, 25% DECOY_1, 25% DECOY_2) but with -r 0.5. I had
assumed the ratio was supposed to be specified as decoys_used/targets and
there are twice as many targets as DECOY_2s in my case so -r = 0.5.



Having looked in ProphetModels.pl I'm now not so sure.... the estimation if
-r isn't supplied is pp_prob_array / pp_prob_array_decoy for hits with
p<=0.02, but I'm not sure whether this is total/decoy or target/decoy.



Can you confirm which approach is correct?



Not a huge problem for me if -r 0.5 is wrong, as am computing and using
decoy stats elsewhere, external to TPP. Would just mean the plots from
ProphetModels.pl that are being saved are wrong.



Thanks,



Dave Trudgian



On Thursday, December 19, 2013 2:01:53 AM UTC-6, Rene B wrote:

Hi David,



Thank you for your quick reply and suggestions. The decoy ratio is set to
0.25 as I use two sets of decoys, one for modeling and the other for
validation. Each decoy set corresponds to 25% of entries in the database.



Kind regards,


Rene



Op woensdag 18 december 2013 20:13:27 UTC+1 schreef David Shteynberg:

Hello Rene

Thanks for using the tools and double checking your work.

In my tests I have found that applying the NSP model at the iProphet step
greatly improves performance on peptide level.  And applying the NSP model
at the ProteinProphet step improves performance on the protein level.  The
two models are somewhat different since the ProteinProphet model considers
grouping information while the iProphet model doesnt.  I have not found the
two to interfere.

A safe and conservative approach so would look at the conservative estimate
e.g. ProteinProphet probability cutoff to give me 1% error with decoys or
1% error with the model which ever is more conservative.

When the model tends to underestimate error on protein or peptide level
this is usually stemming from underestimation at the spectrum level by
PeptideProphet and can be controlled by the CLEVEL={value} option for
PeptideProphetParser -c{value} for xinteract.  Setting this to a number
greater than zero like .5 or 1 or 2 will serve to make the model more
conservative overall, a negative value will have opposite effect which will
carry through to the peptide and protein levels.

Also I am curious why you set decoy rate to 0.25?

Best,
David

On Dec 18, 2013 7:29 AM, "Rene B" <[email protected]> wrote:

Hi all,



I am running PeptideProphet, iProphet and ProteinProphet (TPP 4.6.3) on Q
Exactive data searched with Comet, Myrimatch and OMSSA. I wondered if the
NSP model should be disabled in ProteinProphet when it is enabled in
iProphet? I got confused because it seems Petunia enables the NSP model
both in iprophet and proteinprophet by default (ie. when xinteract runs
with the -ip option).



Another question is that when I compare decoy estimated protein FDRs to
ProteinProphet modelled FDRs, ProteinProphet seems a bit optimistic (decoy
based FDR of 0.1% corresponds to ~0.02% model FDR). This is with NSP
enabled in iProphet and disabled in ProteinProphet. How should I deal with
discrepancy, ie. should I take the decoy or probability based FDR to select
a probability cutoff?



I have attached some examples for a search with myrimatch only. These are
the commands I used to generate the graphs:



xinteract -Nmyrimatch.pep.xml -OAP -p0 -a%ExperimentFolder% -dDECOY0
-E%ExperimentTag% *.pep.xml

InterProphetParser myrimatch.pep.xml myrimatch.ipro.pep.xml

ProphetModels.pl -i myrimatch.ipro.pep.xml -k -r 0.25 -d "DECOY1"

ProteinProphet myrimatch.ipro.pep.xml myrimatch.prot.xml IPROPHET NONSP

ProtProphModels.pl -k -r 0.25 -d DECOY1 -i myrimatch.prot.xml



The graphs are:



myrimatch_all.ipro.pep_FDR_10pc: PeptideProphet/iProphet decoy vs model
FDR, all models enabled

myrimatch_nonsp.ipro.pep_FDR_10pc: PeptideProphet/iProphet decoy vs model
FDR, NSP model disabled in iProphet

myrimatch_nonsp.prot_FDR_5pc: ProteinProphet decoy vs model FDR, NSP model
disabled in ProteinProphet

myrimatch_all.prot_FDR_5pc: ProteinProphet decoy vs model FDR, NSP model
enabled in iProphet and ProteinProphet



Thanks in advance!



Kind regards,



Rene





-- 
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected] <javascript:>.
To post to this group, send email to [email protected]<javascript:>
.
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

RE: [spctools-discuss] NSP model in iProphet/ProteinProphet; model vs decoy based FDR in ProteinProphet

Reply via email to