Hello, So I've got TPP working well on unix, processing the results of multiple search engine very efficiently. Thanks for the help that's got me this far. However I have a number of outstanding questions that I'd like to understand before I really trust my use of the pipeline. Hopefully someone can help. If I've missed something in the documentation or in the papers please let me know.
*1: Model failure* When a model for a charge state fails (" Mixture model quality test failed for charge (4+)") what happens to data from that sample and charge state? Is it excluded from the rest of the pipeline? *2: Parametric vs. semi-parametric modelling* In many of my samples I find more proteins with non-parametric modelling than with parametric modelling. Do you expect non-parametric modelling to be less conservative, or is this very data-set dependent? Is the semi-supervised / semi-parametric modelling normally preferred because it makes less assumptions of the data? In the FAQ page "What is CLEVEL and how do I use it?" there are nice plots that I can imagine being helpful for making decisions about model performance. Is there an easy way to produce these plots? If I copy the data over to look at in the Petunia GUI on windows I can't find any such plots, but I've seen them in another question in this discussion group which makes me think I'm missing something... *3. Combining search engine results run with different parameters* There's supposed to be no assumption of orthogonality when combining the results of multiple search engines in the TPP. So is it also acceptable to run the same search engine with multiple parameters (eg with and without certain variable modifications) and then combine these results? Is there no danger that this will artificially inflate the probability of a protein, because the search space is made to appear artificially small? *4. The function of some of the scripts...* I've worked out how to run TPP on unix by running it on the windows GUI and looking at the command list. However I'm really not clear about the precise function of some of the programs and scripts. If I'm combining the results of multiple search engines then I run the following programs. Below I give a brief description of how I use it and what I think it does. Any corrections or elaborations would be appreciated: *InteractParser* Here I combine different pep.xml files from technical replicates and set the experiment and enzyme tags *DatabaseParser* Is this necessary? How does it alter the pepXML files? *RefreshParser* I use this to make sure that all pepXML files are referencing the same database. This is necessary because some search engines have different database requirements - some generate the decoy database themselves while other require it appended to the real database. So here I make sure all the files reference the non-appended version - otherwise I presume there would be problems when combining the results downstream when they derive from 'different' databases. *Is this use of RefreshParser necessay / appropriate?* *PeptideProphetParser* Runs peptide prophet *ProphetModels.pl* Does this alter the pepXML file? *tpp_models.pl* Does this alter the pepXML file? *InterProphetParser* Runs iprophet, combining the different search results. I presume that this uses the appropriate decoy tag for each file as provided to PeptideProphetParser? *RefreshParser* Seems to be necessary. Not quite sure why. *ProteinProphet* Runs protein prophet on the iprophet results Thanks! Alastair -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discuss+unsubscr...@googlegroups.com. To post to this group, send email to spctools-discuss@googlegroups.com. Visit this group at https://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/d/optout.