Hello Ira, You make very good observations, and are absolutely correct in your interpretation of how ProteinProphet should work.
The bad: A few years ago we noticed the behavior you just described in ProteinProphet, in spite of what the publication claimed. It turns out that the rule to ignore all peptides with weights less than 50%, when applied to certain complex protein groups, would often produce clusters with a high group probability (peptides were indeed validated with high prob), but with low/zero protein probabilities (none were assigned above 50% weight). The ugly: We introduced a quick fix via a parameter GROUPWTS. Using this (advanced) parameter, the algorithm would compute each peptide weight within the protein group, and calculate protein probabilities using them, even if below the 0.5 threshold. While this boosted the probabilities of proteins within groups, the implementation did not lead to an parsimonious/Occam's razor solution, so that some proteins ended up with (inconsistently) inflated probabilities. The good! A full fix is now included in the latest release of TPP (4.8.0 - out tomorrow!), and is the default mode when running ProteinProphet -- no options need to be set. Please re-run your analysis with 4.8.0 when you get a chance, and let us know if it behaves as expected with your data. I also suggest using the new ProteinProphet viewer (now the default), which allows you to visualize the mapped peptides graphically by clicking on the Group Entry number -- I can provide more details if you have any questions. Lastly, in order to provide old functionality with the new software, we kept the GROUPWTS behavior, though we renamed the option SOFTOCCAM; the old ProteinProphet way of apportioning weights can be recreated with the NOGROUPWTS option. Again, these are advanced options that most users will never need to invoke; all are listed in the usage statement -- type 'ProteinProphet' by itself on the command-line and press return. Hope this answers your query and solves the issue. Do let us know if you find more odd behavior, and thanks for reporting it. --Luis On Mon, Nov 17, 2014 at 5:55 PM, Ira Cooke <[email protected]> wrote: > I have a protXML file that I generated using TPP 4.7.1 > > The file is on dropbox at this link > > https://dl.dropboxusercontent.com/u/226794/47.prot.xml > > I get very similar results with TPP 4.6.3 on the same dataset .. > > https://dl.dropboxusercontent.com/u/226794/463.txt > > > There are several protein groups in the file whose weights and > probabilities I'm struggling to understand. In particular group number 240 > > The group probability for this group is 1.0 , but all of the member > probabilities are 0. Reading back over older posts on this forum I found > several posts that seemed relevant to the situation; > > > https://groups.google.com/forum/#!searchin/spctools-discuss/group_probability/spctools-discuss/UGQ4KrWr56k/O1FQ15D0KzEJ > > http://groups.google.com/group/spctools-discuss/browse_thread/thread/ > 29f91682aaae75f7/b275563afb68ed9e?lnk=gst&q=gattaca#b275563afb68ed9e > > And I also found older posts mentioning something about a cut-off with > respect to the peptide weights that would influence whether a peptide would > be regarded as contributing evidence for a protein. > > I'm struggling to put all of this together because when I read the Protein > Prophet paper (esp Figure 4 and text on that page), my understanding of > it's algorithm is that when several proteins have shared peptides, but one > protein contains all of the peptides in all the other group members (as is > the case in the example I've provided) .. then the final result will be > that all of the peptides will have weights of 1.0 to the protein that > contains them all ... but will be assigned weights of 0.0 to other members > of the group. This makes sense to me as it satisfies Occam's razor ... > since one protein could explain the presence of all these peptides. > > If you take a look at the protXML file provided and at group number 240 > you'll see that the peptide weights are all quite low (around 0.2) ... and > are similar across the various member proteins. > > If I take a look at other groups in the file (eg group number 246) I see > behaviour much closer to what I would expect ... all weights are 1.0 on the > protein that contains all the peptides. > > > Can anyone provide an explanation as to what is happening here? My worry > is that this is due to the model not converging? ... Or (more likely) I > just don't understand the meaning behind the various Protein Prophet > outputs. > > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/spctools-discuss. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/d/optout.
