Hello,
So I've got TPP working well on unix, processing the results of multiple
search engine very efficiently. Thanks for the help that's got me this far.
However I have a number of outstanding questions that I'd like to
understand before I really trust my use of the pipeline. Hopefully someone
can help. If I've missed something in the documentation or in the papers
please let me know.
*1: Model failure*
When a model for a charge state fails (" Mixture model quality test failed
for charge (4+)") what happens to data from that sample and charge state?
Is it excluded from the rest of the pipeline?
*2: Parametric vs. semi-parametric modelling*
In many of my samples I find more proteins with non-parametric modelling
than with parametric modelling. Do you expect non-parametric modelling to
be less conservative, or is this very data-set dependent? Is the
semi-supervised / semi-parametric modelling normally preferred because it
makes less assumptions of the data?
In the FAQ page "What is CLEVEL and how do I use it?" there are nice plots
that I can imagine being helpful for making decisions about model
performance. Is there an easy way to produce these plots? If I copy the
data over to look at in the Petunia GUI on windows I can't find any such
plots, but I've seen them in another question in this discussion group
which makes me think I'm missing something...
*3. Combining search engine results run with different parameters*
There's supposed to be no assumption of orthogonality when combining the
results of multiple search engines in the TPP. So is it also acceptable to
run the same search engine with multiple parameters (eg with and without
certain variable modifications) and then combine these results? Is there no
danger that this will artificially inflate the probability of a protein,
because the search space is made to appear artificially small?
*4. The function of some of the scripts...*
I've worked out how to run TPP on unix by running it on the windows GUI and
looking at the command list. However I'm really not clear about the precise
function of some of the programs and scripts. If I'm combining the results
of multiple search engines then I run the following programs. Below I give
a brief description of how I use it and what I think it does. Any
corrections or elaborations would be appreciated:
*InteractParser*
Here I combine different pep.xml files from technical replicates and set
the experiment and enzyme tags
*DatabaseParser*
Is this necessary? How does it alter the pepXML files?
*RefreshParser*
I use this to make sure that all pepXML files are referencing the same
database. This is necessary because some search engines have different
database requirements - some generate the decoy database themselves while
other require it appended to the real database. So here I make sure all the
files reference the non-appended version - otherwise I presume there would
be problems when combining the results downstream when they derive from
'different' databases. *Is this use of RefreshParser necessay /
appropriate?*
*PeptideProphetParser*
Runs peptide prophet
*ProphetModels.pl*
Does this alter the pepXML file?
*tpp_models.pl*
Does this alter the pepXML file?
*InterProphetParser*
Runs iprophet, combining the different search results. I presume that this
uses the appropriate decoy tag for each file as provided to
PeptideProphetParser?
*RefreshParser*
Seems to be necessary. Not quite sure why.
*ProteinProphet*
Runs protein prophet on the iprophet results
Thanks!
Alastair
--
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.