Re: [spctools-discuss] NSP model in iProphet/ProteinProphet; model vs decoy based FDR in ProteinProphet

Dave Trudgian Thu, 06 Feb 2014 14:20:33 -0800

David,

Thanks for the pointer to the iProphet paper - very useful. I'd just been 
thinking over a coffee about r=1/3 if ProphetModels could ignore the first 
decoy set. Disabling DECOYPROBS on the DECOY1 set hadn't come into my head. 
I'd worried in the past about the degeneracy issue, but have just ignored 
it so far.


I have been working off the decoy probs downstream to report estimated FDRs 
both at model fitting (DECOY1) and on the independent set (DECOY2), with 
the latter used for filtering, and the former just as info for the curious. 
I guess I can disable DECOYPROBS and just compute FDR on the independent 
set, or modify ProphetModels.pl so it can ignore specified (DECOY1) 
sequences in its computations. That way the ProphetModels.pl output is 
going to be consistent with the downstream stuff.

I guess the only thing I'm left wondering is whether the ProphetModels.pl 
help statement might confusing to others as well? I've always considered a 
'ratio' to generally between two distinct sets, i.e. target:decoy rather 
than a subset vs total. Maybe it could be explicitly stated?

-r <NUM>  -- Specify decoy ratio (decoy/total sequences). Will guess from 
P<0.001 hits if not specified.

Thanks again.

Dave T



On Thursday, February 6, 2014 3:39:20 PM UTC-6, David Shteynberg wrote:
>
> Hi Dave,
>
> r is computed as Decoy / Total with less than 2% probability.  There is a 
> detailed discussion of this in the iProphet paper.
>
>
> If you have a DB of 50% target 50% decoy and none of the decoys are 
> discarded (which is one way to use your 50%T 25%D1 25%D2) then r = 0.5
>
> If you discard half of the decoys e.g. D1 is used for modelling and 
> DECOYPROBS is disabled (in which case all D1 get probability 0) and all D1 
> should be excluded from the analysis by ProphetModels.pl .  Then the 
> remaining decoys D2 will constitute roughly 1/3 of the remaining database 
> entries and r will be roughly one third ( 25/75 = 0.3333) .  In fact, r is 
> related not only to the protein counts but to the distinct peptides in each 
> set of the Database entries, and as the original database and the decoys 
> may have degenerate (repeated) peptides,  that's why it will be only 
> roughly that percentage and vary depending on the database, how the decoys 
> are constructed and how indepent are D1's decoys from D2 decoys.
>
> The iProphet paper carries more info on this than I can put in an email, 
> so that's a good reference for this.
>
> Cheers,
> -David  
>
>
>
>
>
>
>
> On Thu, Feb 6, 2014 at 12:17 PM, Dave Trudgian <
> [email protected] <javascript:>> wrote:
>
>> David, 
>>
>> I just saw Rene's note about the -r 0.25 decoy ratio. I'm similarly using 
>> 2 decoy sets (50% target, 25% DECOY_1, 25% DECOY_2) but with -r 0.5. I had 
>> assumed the ratio was supposed to be specified as decoys_used/targets and 
>> there are twice as many targets as DECOY_2s in my case so -r = 0.5.
>>
>> Having looked in ProphetModels.pl I'm now not so sure.... the estimation 
>> if -r isn't supplied is pp_prob_array / pp_prob_array_decoy for hits with 
>> p<=0.02, but I'm not sure whether this is total/decoy or target/decoy.
>>
>> Can you confirm which approach is correct? 
>>
>> Not a huge problem for me if -r 0.5 is wrong, as am computing and using 
>> decoy stats elsewhere, external to TPP. Would just mean the plots from 
>> ProphetModels.pl that are being saved are wrong.
>>
>> Thanks,
>>
>> Dave Trudgian
>>
>>
>> On Thursday, December 19, 2013 2:01:53 AM UTC-6, Rene B wrote:
>>>
>>> Hi David, 
>>>
>>> Thank you for your quick reply and suggestions. The decoy ratio is set 
>>> to 0.25 as I use two sets of decoys, one for modeling and the other for 
>>> validation. Each decoy set corresponds to 25% of entries in the database.
>>>
>>> Kind regards,
>>>
>>> Rene
>>>
>>>
>>> Op woensdag 18 december 2013 20:13:27 UTC+1 schreef David Shteynberg:
>>>>
>>>> Hello Rene
>>>>
>>>> Thanks for using the tools and double checking your work.
>>>>
>>>> In my tests I have found that applying the NSP model at the iProphet 
>>>> step greatly improves performance on peptide level.  And applying the NSP 
>>>> model at the ProteinProphet step improves performance on the protein 
>>>> level.  The two models are somewhat different since the ProteinProphet 
>>>> model considers grouping information while the iProphet model doesnt.  I 
>>>> have not found the two to interfere. 
>>>>
>>>> A safe and conservative approach so would look at the conservative 
>>>> estimate e.g. ProteinProphet probability cutoff to give me 1% error with 
>>>> decoys or 1% error with the model which ever is more conservative.
>>>>
>>>> When the model tends to underestimate error on protein or peptide level 
>>>> this is usually stemming from underestimation at the spectrum level by 
>>>> PeptideProphet and can be controlled by the CLEVEL={value} option for 
>>>> PeptideProphetParser -c{value} for xinteract.  Setting this to a number 
>>>> greater than zero like .5 or 1 or 2 will serve to make the model more 
>>>> conservative overall, a negative value will have opposite effect which 
>>>> will 
>>>> carry through to the peptide and protein levels.
>>>>  
>>>> Also I am curious why you set decoy rate to 0.25?
>>>>
>>>> Best,
>>>> David
>>>> On Dec 18, 2013 7:29 AM, "Rene B" <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am running PeptideProphet, iProphet and ProteinProphet (TPP 4.6.3) 
>>>>> on Q Exactive data searched with Comet, Myrimatch and OMSSA. I wondered 
>>>>> if 
>>>>> the NSP model should be disabled in ProteinProphet when it is enabled in 
>>>>> iProphet? I got confused because it seems Petunia enables the NSP model 
>>>>> both in iprophet and proteinprophet by default (ie. when xinteract runs 
>>>>> with the -ip option).
>>>>>
>>>>> Another question is that when I compare decoy estimated protein FDRs 
>>>>> to ProteinProphet modelled FDRs, ProteinProphet seems a bit optimistic 
>>>>> (decoy based FDR of 0.1% corresponds to ~0.02% model FDR). This is with 
>>>>> NSP 
>>>>> enabled in iProphet and disabled in ProteinProphet. How should I deal 
>>>>> with 
>>>>> discrepancy, ie. should I take the decoy or probability based FDR to 
>>>>> select 
>>>>> a probability cutoff?
>>>>>
>>>>> I have attached some examples for a search with myrimatch only. These 
>>>>> are the commands I used to generate the graphs:
>>>>>
>>>>> xinteract -Nmyrimatch.pep.xml -OAP -p0 -a%ExperimentFolder% -dDECOY0 
>>>>> -E%ExperimentTag% *.pep.xml
>>>>> InterProphetParser myrimatch.pep.xml myrimatch.ipro.pep.xml
>>>>> ProphetModels.pl -i myrimatch.ipro.pep.xml -k -r 0.25 -d "DECOY1"
>>>>> ProteinProphet myrimatch.ipro.pep.xml myrimatch.prot.xml IPROPHET NONSP
>>>>> ProtProphModels.pl -k -r 0.25 -d DECOY1 -i myrimatch.prot.xml
>>>>>
>>>>> The graphs are:
>>>>>
>>>>> myrimatch_all.ipro.pep_FDR_10pc: PeptideProphet/iProphet decoy vs 
>>>>> model FDR, all models enabled
>>>>> myrimatch_nonsp.ipro.pep_FDR_10pc: PeptideProphet/iProphet decoy vs 
>>>>> model FDR, NSP model disabled in iProphet
>>>>> myrimatch_nonsp.prot_FDR_5pc: ProteinProphet decoy vs model FDR, NSP 
>>>>> model disabled in ProteinProphet
>>>>> myrimatch_all.prot_FDR_5pc: ProteinProphet decoy vs model FDR, NSP 
>>>>> model enabled in iProphet and ProteinProphet
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Rene
>>>>>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "spctools-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "spctools-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to 
>> [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/spctools-discuss.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [spctools-discuss] NSP model in iProphet/ProteinProphet; model vs decoy based FDR in ProteinProphet

Reply via email to