[spctools-discuss] Re: Reg:TPP combatibility with Shimadzu MALDI-TOF-TOF data

mnotes Wed, 18 Feb 2009 07:15:10 -0800

Hi David:
Thank you, David, for interesting conversation. The world you just
described does not exits, fortunately I would say, and it is probably
a time to think about real world which is outside of all-round but
practically useless mathematical models.
Sergey


On Feb 18, 12:55 am, David Shteynberg <dshteynb...@systemsbiology.org>
wrote:
> Thank you for this interesting discussion.  My original point was to
> say that data collected on the same type of instrument and searched
> against the same database with the same parameters can be analyzed
> together in PeptideProphet.
>
> I accept your point that in the case of the additional models that
> PeptideProphet applies, specifically the models that deal with
> features of identified peptides Number of Tolerable Termini, Number of
> Missed Cleavages, etc. "leads (at some conditions) to strong
> cross-talk between data and incorrect estimation of probabilities in
> combined data set".  Especially, if there is reason to expect that the
> quality of enzymatic digestion would vary between the two datasets.
> However, if the searches are done in the same way against the same
> database there is nothing that should stop one from combining
> independent datasets collected on the same machine using models that
> would not be expected to vary between the experiments being combined:
> the search engine f-value model or the mass accuracy model which
> models a feature of the instrument.
>
> > "already "interact" with each other in real life just by virtue of the
> > instrument itself"
> > If you see that it is a time to call service and clear the instrument.
>
> Core facilities run standards for this very reason.
>
> -David
>
>
>
> On Tue, Feb 17, 2009 at 5:39 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote:
>
> > I disagree that
> > "finding is that these datasets are not truly independent because as
> > you note if they
> > were truly independent datasets the probabilities would not change.
> > Different datasets that are collected on the same instrument (or same
> > type of instrument) already "interact" with each other in real life
> > just by virtue of the instrument itself."
>
> > Because what you actually see is an attempt of PeptideProphet to
> > derive model at conditions when solution is metastable. This is sort
> > of bifurcation point if we apply terminology of "catastrophe
> > theory" (i do not known actual term in English). Therefore slightest
> > deviation in initial conditions results in dramatic differences in
> > final results and strongly link two presumably independent
> > experiments. Sometimes "bad" data takes over and no probabilities will
> > be assigned at all. This all stems from heuristic calibration in
> > PeptideProphet which makes this tool vulnerable to variations in
> > initials conditions when data are at the border of calibration
> > diapason (that is true if algorithm follows published methods). Thus,
> > solutions provided by PeptideProphet are only a measure of goodness of
> > fit of the data to calibration data. These are not probabilities in
> > mathematical sense, however, they will asymptotically approach to
> > experimental probabilities derived from calibration data (again I
> > assume that what was published is what we get). Experimental
> > probabilities from calibration data presumably approximate true
> > probabilities (which is not a fact to me also).
>
> > As you could see there is a fundamental problem with PeptideProphet
> > which leads (at some conditions) to strong cross-talk between data and
> > incorrect estimation of probabilities in combined data set.
>
> > "already "interact" with each other in real life just by virtue of the
> > instrument itself"
> > If you see that it is a time to call service and clear the instrument.
>
> > Sergey
>
> > On Feb 17, 6:18 pm, David Shteynberg <dshteynb...@systemsbiology.org>
> > wrote:
> >> I don't agree that any hidden links are introduced, rather some
> >> incorrect assumptions are exposed.  When datasets are analyzed
> >> together this is very explicit.  When you are expecting that these
> >> datasets should be independent but you are finding that you get
> >> different results when you are analyzing them together vs. apart and
> >> you get different probabilities what you are actually finding is that
> >> these datasets are not truly independent because as you note if they
> >> were truly independent datasets the probabilities would not change.
> >> Different datasets that are collected on the same instrument (or same
> >> type of instrument) already "interact" with each other in real life
> >> just by virtue of the instrument itself.
>
> >> -David
>
> >> On Tue, Feb 17, 2009 at 2:46 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote:
>
> >> > For two independent experimental data probability of particular scan
> >> > to match particular peptide should not depend on data in second
> >> > independent experiment that is why they are independent experiments.
> >> > With PeptideProphet when you combine two data sets in interact.xml and
> >> > process them together you will see that data from one data set will
> >> > affect results derived from other data set. Thus, you introduce hidden
> >> > link between two experiments which otherwise were not interacting with
> >> > each other in real life. The correctness of this approach is
> >> > questionable, although, you could always find an argument to do this
> >> > mixing and even justify it from point view of mathematical statistic.
>
> >> > Sergey
>
> >> > On Feb 17, 4:17 pm, David Shteynberg <dshteynb...@systemsbiology.org>
> >> > wrote:
> >> >> PeptideProphet compute a probability that a PSM is correct GIVEN the
> >> >> data.  So I am not sure I understand what you mean exactly when you
> >> >> say that this is "mathematically wrong".  It is not correct to compare
> >> >> probabilities on the same spectrum between different runs of
> >> >> PeptideProphet when you give it different data.  However, we not
> >> >> interested in probabilities themselves as much as we are in using the
> >> >> probabilities for estimating FDRs, error rates and sensitivities at
> >> >> different probability cutoffs.  As long as you are comparing FDRs,
> >> >> error rates and sensitivities then it doesn't matter which data you
> >> >> allow PeptideProphet to combine and model together.
>
> >> >> -David
>
> >> >> When computing this conditional probability if the data changes
> >> >> obviously the probability of correctness will changes.
>
> >> >> On Tue, Feb 17, 2009 at 12:05 PM, mnotes <sdoro...@ms.cc.sunysb.edu> 
> >> >> wrote:
>
> >> >> > Good advise David:
>
> >> >> > But,combining data from several independent analysis of similar
> >> >> > samples does not save the day. In fact one "bad" (from point of view
> >> >> > of PeptideProphet) msms-run can spoil the rest of data. And, in fact
> >> >> > it does. It does not sound right to me when estimation of peptides
> >> >> > probabilities in two independent experiments start to depend from each
> >> >> > other. However, it is precisely what is happening with PeptideProphet.
> >> >> > So whether, data combining as you suggested is good idea? Hard to say.
> >> >> > Formal approach suggest that it is mathematically wrong. Common wisdom
> >> >> > argue  who caries (just kidding).
>
> >> >> > Sergey
>
> >> >> > On Feb 17, 2:19 pm, David Shteynberg <dshteynb...@systemsbiology.org>
> >> >> > wrote:
> >> >> >> Hello Anand and Sergey,
>
> >> >> >> Negative values are placeholders to identify potentially correct PSMs
> >> >> >> when there is too little data, insufficient separation between
> >> >> >> positive and negative distributions in the model, or non-bimodal
> >> >> >> search results (often a result of bad search parameters missing 
> >> >> >> static
> >> >> >> mod. etc).  PeptideProphet attempt to fish out those ids that are
> >> >> >> above the background but assigns them negative codes equal to their
> >> >> >> charge instead of a probability.  If you look in the pepXML file 
> >> >> >> these
> >> >> >> identifications should have an analysis="incomplete" inside the
> >> >> >> peptideprophet_result tags.
>
> >> >> >> So solutions would include to double check your search parameters,
> >> >> >> include decoys in your search database and include those when you run
> >> >> >> PeptideProphet, combine your dataset with similar dataset to increase
> >> >> >> sample size and improve the statistics.
>
> >> >> >> -David
>
> >> >> >> On Tue, Feb 17, 2009 at 10:35 AM, mnotes <sdoro...@ms.cc.sunysb.edu> 
> >> >> >> wrote:
>
> >> >> >> > Dear Natalie:
>
> >> >> >> > From personal communication of Anand to me : "But after MSMS file 
> >> >> >> > for
> >> >> >> > single protein fragment peak I could neither get the models but an
> >> >> >> > error saying no data quitting that happens at the protein prophet
> >> >> >> > level.After the peptide prophet probabilities are observed on few
> >> >> >> > occasions to be in negative values."
>
> >> >> >> > So the problem is not in inability of Anand to convert data to 
> >> >> >> > mzXML
> >> >> >> > file but in inability of PeptideProphet to estimate peptide
> >> >> >> > probabilities. I was always amused by negative numbers for "peptide
> >> >> >> > probabilities" generated by PeptideProphet. How that is possible?
> >> >> >> > Probabilities are always positive numbers. But after a while I 
> >> >> >> > decided
> >> >> >> > that if somebody want to mark some arbitrary measures of goodness 
> >> >> >> > of
> >> >> >> > fit as "probabilities" so be it.
>
> >> >> >> > PeptideProphet systemically fails to produce meaningful results 
> >> >> >> > when
> >> >> >> > number of experimental observations is small. Which is, by the way,
> >> >> >> > often a case.
> >> >> >> > The dependence of "peptide probabilities" estimated by 
> >> >> >> > PeptideProphet
> >> >> >> > from the presence of other data is the other amusement provided by
> >> >> >> > PeptideProphet.
>
> >> >> >> > I do not mean to undermine the importance of this free software but
> >> >> >> > meaningful guide is required which will address issues and 
> >> >> >> > limitations
> >> >> >> > of TPP so that regular folks (like me for instance) would use it 
> >> >> >> > with
> >> >> >> > complete understanding what this software can and can not do.
>
> >> >> >> > Sergey
>
> >> >> >> > On Feb 17, 12:34 pm, Natalie Tasman <ntas...@systemsbiology.org>
> >> >> >> > wrote:
> >> >> >> >> Hello Arand,
>
> >> >> >> >> You'll need to start by converting the Shimadzu-specific 
> >> >> >> >> instrument
> >> >> >> >> files to mzXML or mzML files in
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to spctools-discuss@googlegroups.com
To unsubscribe from this group, send email to 
spctools-discuss+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

[spctools-discuss] Re: Reg:TPP combatibility with Shimadzu MALDI-TOF-TOF data

Reply via email to