Re: [spctools-discuss] Less protein IDs after running iProphet

Alejandro Fri, 08 Dec 2017 07:23:21 -0800

Hi David,

Thanks for the input, I'm running now PeptideProphet adding the d in -O 
option -Od, and running with min probability to 0, which I guess then takes 
more time and produces bigger files, almost 10x increase. There is a 
difference but not much. However I know understand a little bit more what 
you refer. When running like this I also see that in the model page in 
Petunia I get similar the predicted and decoy ROC curves, similar to what 
you posted of Florian's data, and mine look similar so I guess I'm on the 
right path.


The peptides that I was referring are not decoys are peptides belonging to 
targets and looking at the spectrum the look fine. The peptides where the 
same sequence, but with a different label, e.g.  n[29]ELVISK [156] and 
n[35]ELVIS[162], both in the same fraction, both get a high probability in 
PeptideProphet, but after running iProphet, in Petunia I only see one. 
After looking into it I think it might be something in Petunia, as if I 
check manually the pepXML I see them both, just not on the browser.

Alejandro

On Tuesday, December 5, 2017 at 4:09:33 PM UTC+1, David Shteynberg wrote:
>
> Hello Alejandro,
>
> If you have decoys in your database, the best comparison would look at the 
> peptide/protein IDs at a set decoy-estimated error rate.  I would suggest 
> you compare the results using Decoy Peptide Validation and Decoy Protein 
> Validation tools to give yourself the most accurate comparison at the 
> decoy-estimated error rate.  I woudl also suggest you set you minimum 
> PeptideProphet probability to 0 to allow the models in iProphet the most 
> discriminating power between corrects and incorrect. Finally, there is no 
> reason to expect your high scoring PeptideProphet results to remain high 
> scoring after iProphet (what if you high scoring PeptideProphet results are 
> Decoys?)  The goal of iProphet is to identify the correct peptide 
> sequences, this entails pushing down the wrong high scoring results at the 
> PeptideProphet level.  So...rerun the analysis using minimum PeptideProphet 
> probability of 0 and compare the results at the same decoy-estimated error 
> rates at the spectum, peptide and protein level.  If you still have 
> concerns please link your data so I can download and troubleshoot the 
> analysis.
>
> -David
>
>
>
> On Tue, Dec 5, 2017 at 6:47 AM, Alejandro <[email protected] 
> <javascript:>> wrote:
>
>> Dear all,
>>
>> I would like to reopen this discussion. I have been testing iProphet and 
>> have experienced a similar thing as Florian.
>>
>> I am searching dimethyl labeled samples, doing two static searches (heavy 
>> and light) either with Comet or x!Tandem, then I combine both searches 
>> (heavy and light) with PeptideProphet and do ProteinProphet, "as is" and 
>> also using the MPT for a 0.01 error in ProteinProphet. Then I use the basic 
>> PeptideProphet results (run with P0.05) of Comet and X!tandem to combine 
>> both results using iProphet with default parameters and selecting 
>> ProteinProphet. Unfortunately, I would expect to increase the IDs, or at 
>> least to have the same as with either search engine. However, this is not 
>> the case, for e.g.
>>
>> ProteinProphet results filtered to 0.01 error
>>
>> Comet
>> 1809 (238 single hits) = 1571
>>
>> Tandem
>> 1498 (152 single hits) = 1346
>>
>> Comet and Tandem combined with iProphet
>> 1623 (376) = 1247
>>
>> Comet and Tandem combined with iProphet without NSP model
>>
>> 1717 (366) = 1351
>>
>> So, the single peptide hits appear to increase when combining, and in the 
>> end there are less proteins identified with more than 1 peptide.
>>
>> When looking at the models of each search engine, there's a good 
>> separation of both distributions.
>>
>> Furthermore, when looking at specific proteins I have encountered that 
>> peptides having a PeptideProphet probability above 0.9 (above my MPT for 
>> 0.01) in both Comet and X!tandem, are gone when combining with iProphet. 
>> Why could this be happening? Shouldn't this get even higher probability?
>>
>> Hope someone of you could give me a hint on this.
>>
>> Cheers,
>>
>> Alejandro
>>
>>
>> On Friday, March 20, 2015 at 2:55:39 PM UTC+1, Florian wrote:
>>>
>>> Hej David,
>>>
>>> sorry for my late response, I wanted to do some proper testing before 
>>> reporting back to you. I did the following tests now, always using the -Od 
>>> option and checking manually for Decoy hits:
>>>
>>> a) I reran the analysis that I posted above with these results:
>>>
>>> 1) X!Tandem only without iProphet: 1884 (557), 1% model estimated error, 
>>> 18/1884 = 0.95% Decoy estimated error
>>> 2) X!Tandem only with iProphet: 1760 (854), 0.9% model estimated error, 
>>> 6/1760 = 0.34% Decoy estimated error
>>> 3) MSGF only without iProphet: 2138 (632), 1% model estimated error, 
>>> 31/2138 = 1.4% Decoy estimated error
>>> 4) MSGF only with iProphet: 1975 (876), 0.8% model estimated error, 
>>> 8/1975 = 0.4% Decoy estimated error
>>> 5) X!Tandem and MSGF without iProphet: 2176 (590), 0.5% model estimated 
>>> error, 28/2176 = 1.2% Decoy estimated error
>>> 6) X!Tandem and MSGF with iProphet: 2154 (1057), 1% model estimated 
>>> error, 17/2154 = 0.8% Decoy estimated error
>>>
>>> b) I included a 5% FDR on peptide level after PeptideProphet, based on 
>>> the model estimate:
>>>
>>> 1) X!Tandem only without iProphet: 1847 (569), 1% model estimated error, 
>>> 13/1847 = 0.7% Decoy estimated error
>>> 2) X!Tandem only with iProphet: 1756 (848), 0.9% model estimated error, 
>>> 5/1756 = 0.3% Decoy estimated error
>>> 3) MSGF only without iProphet: 2125 (647), 1% model estimated error, 
>>> 29/2125 = 1.3% Decoy estimated error
>>> 4) MSGF only with iProphet: 1975 (875), 0.8% model estimated error, 
>>> 8/1975 = 0.4% Decoy estimated error
>>> 5) X!Tandem and MSGF without iProphet: 2216 (655), 0.9% model estimated 
>>> error, 37/2216 = 1.6% Decoy estimated error
>>> 6) X!Tandem and MSGF with iProphet: 2157 (1065), 1% model estimated 
>>> error, 20/2157 = 0.9% Decoy estimated error
>>>
>>> c) I used the less redundant swissprot database:
>>>
>>> 1) X!Tandem only without iProphet: 1812 (419), 0.7% model estimated 
>>> error, 15/1812 = 0.8% Decoy estimated error
>>> 2) X!Tandem only with iProphet: 1671 (752), 0.7% model estimated error, 
>>> 8/1671 = 0.5% Decoy estimated error
>>> 3) MSGF only without iProphet: 2077 (602), 0.8% model estimated error, 
>>> 0/2077 = 0% Decoy estimated error
>>> 4) MSGF only with iProphet: 1945 (855), 0.8% model estimated error, 
>>> 0/1945 = 0% Decoy estimated error
>>> 5) X!Tandem and MSGF without iProphet: 2238 (622), 1% model estimated 
>>> error, 51/2187 = 2.3% Decoy estimated error
>>> 6) X!Tandem and MSGF with iProphet: 2147 (1031), 0.9% model estimated 
>>> error, 21/2144 = 1% Decoy estimated error
>>>
>>> My conclusions:
>>> - shutting off the IPROPHET option in ProteinProphet lets the model 
>>> underestimate the error. However, it is not far off.
>>> - enabling the IPROPHET option in ProteinProphet gives a good error 
>>> estimation, but one looses a lot of peptides from proteins that were 
>>> identified by more than one peptide. As I understood the iProphet 
>>> algorithms, such peptides should rather get a higher probability in 
>>> iProphet.
>>>
>>> I also uploaded the data to my dropbox and will send you the link.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "spctools-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/spctools-discuss.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Re: [spctools-discuss] Less protein IDs after running iProphet

Reply via email to