Hello,

So I've got TPP working well on unix, processing the results of multiple 
search engine very efficiently. Thanks for the help that's got me this far. 
However I have a number of outstanding questions that I'd like to 
understand before I really trust my use of the pipeline. Hopefully someone 
can help. If I've missed something in the documentation or in the papers 
please let me know. 


*1: Model failure*

When a model for a charge state fails (" Mixture model quality test failed 
for charge (4+)") what happens to data from that sample and charge state? 
Is it excluded from the rest of the pipeline?

*2: Parametric vs. semi-parametric modelling*

In many of my samples I find more proteins with non-parametric modelling 
than with parametric modelling. Do you expect non-parametric modelling to 
be less conservative, or is this very data-set dependent? Is the 
semi-supervised / semi-parametric modelling normally preferred  because it 
makes less assumptions of the data?

In the FAQ page "What is CLEVEL and how do I use it?" there are nice plots 
that I can imagine being helpful for making decisions about model 
performance. Is there an easy way to produce these plots? If I copy the 
data over to look at in the Petunia GUI on windows I can't find any such 
plots, but I've seen them in another question in this discussion group 
which makes me think I'm missing something... 

*3. Combining search engine results run with different parameters*

There's supposed to be no assumption of orthogonality when combining the 
results of multiple search engines in the TPP. So is it also acceptable to 
run the same search engine with multiple parameters (eg with and without 
certain variable modifications) and then combine these results? Is there no 
danger that this will artificially inflate the probability of a protein, 
because the search space is made to appear artificially small?

*4. The function of some of the scripts...*

I've worked out how to run TPP on unix by running it on the windows GUI and 
looking at the command list. However I'm really not clear about the precise 
function of some of the programs and scripts. If I'm combining the results 
of multiple search engines then I run the following programs. Below I give 
a brief description of how I use it and what I think it does. Any 
corrections or elaborations would be appreciated:

*InteractParser*
Here I combine different pep.xml files from technical replicates and set 
the experiment and enzyme tags


*DatabaseParser*
Is this necessary? How does it alter the pepXML files?


*RefreshParser*
I use this to make sure that all pepXML files are referencing the same 
database. This is necessary because some search engines have different 
database requirements - some generate the decoy database themselves while 
other require it appended to the real database. So here I make sure all the 
files reference the non-appended version - otherwise I presume there would 
be problems when combining the results downstream when they derive from 
'different' databases. *Is this use of RefreshParser necessay / 
appropriate?*


*PeptideProphetParser*
Runs peptide prophet


*ProphetModels.pl*
Does this alter the pepXML file?


*tpp_models.pl*
Does this alter the pepXML file?


*InterProphetParser*
Runs iprophet, combining the different search results. I presume that this 
uses the appropriate decoy tag for each file as provided to 
PeptideProphetParser?


*RefreshParser*
Seems to be necessary. Not quite sure why.

*ProteinProphet*
Runs protein prophet on the iprophet results



Thanks!
Alastair

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to spctools-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to spctools-discuss@googlegroups.com.
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Reply via email to