Hello Sud,

Please remember the job of the TPP is to separate the correct results (signal) 
from the random matches (noise) on the PSM level, peptide level (PTM-level) and 
protein level.  Generally speaking most datasets have a large portion of 
incorrect PSMs, which by the nature of statistics will match to random peptides 
and proteins selected from the search database.  It is standard in our lab that 
we see large analysis of tens of millions of correct PSMs that map to tens of 
thousands of the correct peptides and perhaps only a few thousand or so 
proteins, depending on the sample (e.g. blood plasma.). So I am not concerned 
about the total numbers of proteins you are seeing after applying statistical 
cut-offs.  Certainly there are other models you can try in PeptideProphet that 
might alter your results slightly.  To have a deeper look at your data you have 
to find where the other correct PSMs might be recovered, e.g. you can try a 
semi-tryptic (or unconstrained) search, as we did in this thread, if you think 
the digestion was the issue.  Also, you can search for additional PTMs that are 
present in the sample but the existing search is missing.  I recommend you 
focus on the sensitivity of your analysis rather than the absolute total of 
proteins identified, without consideration for error. The goal of these tools 
is to give you a user-controlled accurate error rate while maximizing the 
sensitivity (number of correct identifications out of the total correct 
identifications possible in the entire analysis.).   One way to apply the TPP 
is to use the results to improve the laboratory methods to try maximize the 
return of correct proteins from the samples.  Also replicates, biological and 
technical are very helpful to help separate the correct signal from random 
noise.

Hope this helps!

-David

> On Jul 30, 2024, at 5:52 AM, sudarshan kumar <[email protected]> wrote:
> 
> Hi David,
> Thank you for clearing my doubts. 
> 
> I have few more queries -
> you said "The reason you are seeing many more protein numbers in the PepXML 
> Viewer (Summary Tab) as opposed to after running ProteinProphet is likely 
> because you haven’t applied any threshold filtering to the probability (or 
> other scores). You are seeing all the hits here as opposed to the “likely 
> correct” hits."
> 
> I tried to anlayze other run files. It is a blood sample run on orbitrap 
> fusion. The total number of scans are around 88000 (I consider it a high 
> number).
> till comet search there are many peptides hits (upto 50000). But as soon as I 
> put stats/models of validation (peptide prophet or iprophet) the number of 
> unique peptides falls down to 300. This drastic reduction in the number of 
> accurate peptides and hence proteins as well force me to think that I am not 
> using correct statistiical models.
> 
> I assume that from such a large number of PSM getting only 300 proteins that 
> too iin blood, is unbelievable. 
> <image.png>
> 
> original without puttin error filter
> <image.png>
> 
> 
> 
> 
> 
> 
> On Mon, Jul 29, 2024 at 8:02 AM David Shteynberg 
> <[email protected] <mailto:[email protected]>> 
> wrote:
>> Hello Sud,
>> 
>> No problem!    
>> 
>> There is a difference between when PeptideProphet reports a probability of 
>> “0” for a PSM vs a probability of “0.0000”.   The lone zero “0” is used to 
>> represent the case when PeptideProphet model did not find a successful model 
>> for a mixture distribution of correct and incorrect result and returned no 
>> model as opposed to the model gave a low probability.  So, if all your 
>> probabilities come back as “0” it means no model and you have to either 
>> adjust analysis model or search parameters or look for another issue with 
>> the data, when you see “0.0000” it means the spectrum had a low score based 
>> on the model that was returned.
>> 
>> The reason you are seeing many more protein numbers in the PepXML Viewer 
>> (Summary Tab) as opposed to after running ProteinProphet is likely because 
>> you haven’t applied any threshold filtering to the probability (or other 
>> scores). You are seeing all the hits here as opposed to the “likely correct” 
>> hits.
>> 
>> Regarding the Butyrophilin, it appears to have several isoforms of which the 
>> first one that got the high probability is necessary to explain all the 
>> observed peptides for this isoform protein family group, the other proteins 
>> in the family share many of the peptides with the tops hit, and come along 
>> for the ride, without having independent peptide evidence that would 
>> distinguish them from the other isoforms.
>> 
>> Please let me know when you have further questions.
>> 
>> Cheers!
>> -David 
>> 
>> 
>>> On Jul 29, 2024, at 4:55 AM, sudarshan kumar <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi David,
>>> Thank you so much for doing an exhaustive study on the data. 
>>> Yes I agree with your keen observation that the data seems more kind of 
>>> poorly digested peptides. I repeated the analysis with both semi as well as 
>>> fully. I was getting 0 probability for fully digested searched data from 
>>> comet (though in comet search there were correct hits). While in semi 
>>> tryptic search (both peptide prophet and iprophet) returned proteins with 
>>> good probability. 
>>> I wonder-  in the summary tab I see there are proteins identified to the 
>>> tune of around 300 but when I look at the protein detail sheet (sorted by 
>>> PSM number) it shows hardly 10-12 proteins with at least 1 PSM. Rest all 
>>> entries are with 0 PSM. Why this? Please explain to me. Can you please also 
>>> explain - when i see the top hit (as per the number of highest PSM), there 
>>> are more than 150 PSM for it. While the list of identified proteins (with 
>>> at least one PSM) is very small - hardly 10-12. Why? Though I expected more 
>>> number of hits with evenly distributed PSM number across the proteins. 
>>> 
>>> Thank you so much!
>>> 
>>> 
>>> 
>>> On Sat, Jul 27, 2024 at 4:13 AM David Shteynberg 
>>> <[email protected] <mailto:[email protected]>> 
>>> wrote:
>>>> Hello again Sud,
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "spctools-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected] 
>>>> <mailto:[email protected]>.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org
>>>>  
>>>> <https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org?utm_medium=email&utm_source=footer>.
>>>> 
>>>> First, you can find my comet.params file attached.  It is modified to a 
>>>> set of parameters that I selected after having played a bit more with your 
>>>> dataset to try to discover some other reason why you might be getting low 
>>>> number of correct IDs.  One thing I am noticing (after having performed a 
>>>> semi-tryptic search with comet) is that the majority of correct peptide 
>>>> IDs are semi-tryptic.  This is expected among incorrect results, but among 
>>>> correct results this indicates a potential issue with tryptic digestion of 
>>>> the sample.  The model for NTT is learned automatically by PeptideProphet 
>>>> and is pasted here:
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "spctools-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected] 
>>>> <mailto:[email protected]>.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org
>>>>  
>>>> <https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org?utm_medium=email&utm_source=footer>.
>>>> 
>>>> 
>>>> I recommend this data is searched without strict tryptic-end requirements 
>>>> on the peptides.
>>>> 
>>>> Cheers!
>>>> -David
>>>> 
>>>> 
>>>>> On Jul 26, 2024, at 10:18 AM, sudarshan kumar <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Thank you so much to both of you Luis and David. It was worth. Otherwise 
>>>>> everytime we were used to work with the data class comet file. 
>>>>> 
>>>>> It worked. 
>>>>> 
>>>>> On Fri, Jul 26, 2024, 22:14 Jimmy Eng <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>> Just so that you're aware, this can also be downloaded from the Comet 
>>>>>> website for each release.   Here's the parameters page for the 2024.01 
>>>>>> release <https://uwpr.github.io/Comet/parameters/parameters_202401/> and 
>>>>>> you can find the parameters page for all prior releases here 
>>>>>> <https://uwpr.github.io/Comet/parameters/>.  Every parameter is 
>>>>>> described and example comet.params files for each release version can be 
>>>>>> downloaded at the head of each parameters release page.
>>>>>> 
>>>>>> On Friday, July 26, 2024 at 4:47:26 AM UTC-7 sudarshan kumar wrote:
>>>>>> Luis,
>>>>>> Thank you so much. I could do it. 
>>>>>> Best 
>>>>>> Sud
>>>>>> 
>>>>>> On Fri, Jul 26, 2024 at 1:30 PM 'Luis Mendoza' via spctools-discuss 
>>>>>> <spctools...@ <>googlegroups.com <http://googlegroups.com/>> wrote:
>>>>>> Hello,
>>>>>> You can create a comet parameters file using Petunia.  Simply choose the 
>>>>>> "Files" menu, then go to the desired directory (or create a new one), 
>>>>>> and then look for and click on the "New" button at the bottom of the 
>>>>>> window; you can then choose to create a new file and give it any name 
>>>>>> you want:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hope this helps,
>>>>>> --Luis
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 26, 2024 at 12:23 AM sudarshan kumar <[email protected] 
>>>>>> <>> wrote:
>>>>>> Please share notepad version of comet.param version 2024. 
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "spctools-discuss" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected] 
>>>>>> <mailto:[email protected]>.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/spctools-discuss/50158fb6-9d50-427f-a8e4-1126333f0cc8n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/50158fb6-9d50-427f-a8e4-1126333f0cc8n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>> 
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "spctools-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected] 
>>>>> <mailto:[email protected]>.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/spctools-discuss/CALZrgHTtZKrLEdKuAvsuvBF-GWn%2Bmd3T0g0dD4DX-EujVSh7Lg%40mail.gmail.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/spctools-discuss/CALZrgHTtZKrLEdKuAvsuvBF-GWn%2Bmd3T0g0dD4DX-EujVSh7Lg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "spctools-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected] 
>>>> <mailto:[email protected]>.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org
>>>>  
>>>> <https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org?utm_medium=email&utm_source=footer>.
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "spctools-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected] 
>>>> <mailto:[email protected]>.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org
>>>>  
>>>> <https://groups.google.com/d/msgid/spctools-discuss/E48C9F54-97ED-4825-AA14-CF84A8956729%40systemsbiology.org?utm_medium=email&utm_source=footer>.
>>> 
>>> 
>>> -- 
>>> -------------------------------------------------------------------
>>> The real voyage of discovery consists not in seeking new lands but seeing 
>>> with new eyes. — Marcel Proust
>>> 
>>> Dr. Sudarshan Kumar
>>> (Fulbright-Nehru Fellow)
>>> (B.V.Sc.& A.H., M.V.Sc <http://m.v.sc/>., PhD.)
>>> Sr. Scientist
>>> Animal Biotechnology Center
>>> (Proteomics and Cell Biology Lab.)
>>> National Dairy Research Institute Karnal, 132001
>>> Haryana, India
>>> Contact No 09254912456
>>> URL www.ndri.res.in <http://www.ndri.res.in/>
>>> Orcid Id: https://orcid.org/0000-0002-9816-4307
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "spctools-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] 
>>> <mailto:[email protected]>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/spctools-discuss/CALZrgHT5WUFyzYQWdLiDk_tMq2%2BHpwKKn7OJyM7iwrZgxcSt-w%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/spctools-discuss/CALZrgHT5WUFyzYQWdLiDk_tMq2%2BHpwKKn7OJyM7iwrZgxcSt-w%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "spctools-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/spctools-discuss/C9F27E69-2022-49BC-BEE6-282369C6E694%40systemsbiology.org
>>  
>> <https://groups.google.com/d/msgid/spctools-discuss/C9F27E69-2022-49BC-BEE6-282369C6E694%40systemsbiology.org?utm_medium=email&utm_source=footer>.
> 
> 
> --
> -------------------------------------------------------------------
> The real voyage of discovery consists not in seeking new lands but seeing 
> with new eyes. — Marcel Proust
> 
> Dr. Sudarshan Kumar
> (Fulbright-Nehru Fellow)
> (B.V.Sc.& A.H., M.V.Sc <http://m.v.sc/>., PhD.)
> Sr. Scientist
> Animal Biotechnology Center
> (Proteomics and Cell Biology Lab.)
> National Dairy Research Institute Karnal, 132001
> Haryana, India
> Contact No 09254912456
> URL www.ndri.res.in <http://www.ndri.res.in/>
> Orcid Id: https://orcid.org/0000-0002-9816-4307
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "spctools-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/spctools-discuss/CALZrgHS_9bhP3jtUO%2BaUKmRMj-J%3DMte0EQ-%3DXcnwdeg%2BD%3D52dw%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/spctools-discuss/CALZrgHS_9bhP3jtUO%2BaUKmRMj-J%3DMte0EQ-%3DXcnwdeg%2BD%3D52dw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/spctools-discuss/223C4760-0503-464D-B7C9-89B0DC35E03E%40systemsbiology.org.

Reply via email to