[spctools-discuss] Re: Reg:TPP combatibility with Shimadzu MALDI-TOF-TOF data

mnotes Tue, 17 Feb 2009 17:39:55 -0800

I disagree that
"finding is that these datasets are not truly independent because as
you note if they
were truly independent datasets the probabilities would not change.
Different datasets that are collected on the same instrument (or same
type of instrument) already "interact" with each other in real life
just by virtue of the instrument itself."


Because what you actually see is an attempt of PeptideProphet to
derive model at conditions when solution is metastable. This is sort
of bifurcation point if we apply terminology of "catastrophe
theory" (i do not known actual term in English). Therefore slightest
deviation in initial conditions results in dramatic differences in
final results and strongly link two presumably independent
experiments. Sometimes "bad" data takes over and no probabilities will
be assigned at all. This all stems from heuristic calibration in
PeptideProphet which makes this tool vulnerable to variations in
initials conditions when data are at the border of calibration
diapason (that is true if algorithm follows published methods). Thus,
solutions provided by PeptideProphet are only a measure of goodness of
fit of the data to calibration data. These are not probabilities in
mathematical sense, however, they will asymptotically approach to
experimental probabilities derived from calibration data (again I
assume that what was published is what we get). Experimental
probabilities from calibration data presumably approximate true
probabilities (which is not a fact to me also).

As you could see there is a fundamental problem with PeptideProphet
which leads (at some conditions) to strong cross-talk between data and
incorrect estimation of probabilities in combined data set.

"already "interact" with each other in real life just by virtue of the
instrument itself"
If you see that it is a time to call service and clear the instrument.

Sergey

On Feb 17, 6:18 pm, David Shteynberg <dshteynb...@systemsbiology.org>
wrote:
> I don't agree that any hidden links are introduced, rather some
> incorrect assumptions are exposed.  When datasets are analyzed
> together this is very explicit.  When you are expecting that these
> datasets should be independent but you are finding that you get
> different results when you are analyzing them together vs. apart and
> you get different probabilities what you are actually finding is that
> these datasets are not truly independent because as you note if they
> were truly independent datasets the probabilities would not change.
> Different datasets that are collected on the same instrument (or same
> type of instrument) already "interact" with each other in real life
> just by virtue of the instrument itself.
>
> -David
>
> On Tue, Feb 17, 2009 at 2:46 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote:
>
> > For two independent experimental data probability of particular scan
> > to match particular peptide should not depend on data in second
> > independent experiment that is why they are independent experiments.
> > With PeptideProphet when you combine two data sets in interact.xml and
> > process them together you will see that data from one data set will
> > affect results derived from other data set. Thus, you introduce hidden
> > link between two experiments which otherwise were not interacting with
> > each other in real life. The correctness of this approach is
> > questionable, although, you could always find an argument to do this
> > mixing and even justify it from point view of mathematical statistic.
>
> > Sergey
>
> > On Feb 17, 4:17 pm, David Shteynberg <dshteynb...@systemsbiology.org>
> > wrote:
> >> PeptideProphet compute a probability that a PSM is correct GIVEN the
> >> data.  So I am not sure I understand what you mean exactly when you
> >> say that this is "mathematically wrong".  It is not correct to compare
> >> probabilities on the same spectrum between different runs of
> >> PeptideProphet when you give it different data.  However, we not
> >> interested in probabilities themselves as much as we are in using the
> >> probabilities for estimating FDRs, error rates and sensitivities at
> >> different probability cutoffs.  As long as you are comparing FDRs,
> >> error rates and sensitivities then it doesn't matter which data you
> >> allow PeptideProphet to combine and model together.
>
> >> -David
>
> >> When computing this conditional probability if the data changes
> >> obviously the probability of correctness will changes.
>
> >> On Tue, Feb 17, 2009 at 12:05 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote:
>
> >> > Good advise David:
>
> >> > But,combining data from several independent analysis of similar
> >> > samples does not save the day. In fact one "bad" (from point of view
> >> > of PeptideProphet) msms-run can spoil the rest of data. And, in fact
> >> > it does. It does not sound right to me when estimation of peptides
> >> > probabilities in two independent experiments start to depend from each
> >> > other. However, it is precisely what is happening with PeptideProphet.
> >> > So whether, data combining as you suggested is good idea? Hard to say.
> >> > Formal approach suggest that it is mathematically wrong. Common wisdom
> >> > argue  who caries (just kidding).
>
> >> > Sergey
>
> >> > On Feb 17, 2:19 pm, David Shteynberg <dshteynb...@systemsbiology.org>
> >> > wrote:
> >> >> Hello Anand and Sergey,
>
> >> >> Negative values are placeholders to identify potentially correct PSMs
> >> >> when there is too little data, insufficient separation between
> >> >> positive and negative distributions in the model, or non-bimodal
> >> >> search results (often a result of bad search parameters missing static
> >> >> mod. etc).  PeptideProphet attempt to fish out those ids that are
> >> >> above the background but assigns them negative codes equal to their
> >> >> charge instead of a probability.  If you look in the pepXML file these
> >> >> identifications should have an analysis="incomplete" inside the
> >> >> peptideprophet_result tags.
>
> >> >> So solutions would include to double check your search parameters,
> >> >> include decoys in your search database and include those when you run
> >> >> PeptideProphet, combine your dataset with similar dataset to increase
> >> >> sample size and improve the statistics.
>
> >> >> -David
>
> >> >> On Tue, Feb 17, 2009 at 10:35 AM, mnotes <sdoro...@ms.cc.sunysb.edu> 
> >> >> wrote:
>
> >> >> > Dear Natalie:
>
> >> >> > From personal communication of Anand to me : "But after MSMS file for
> >> >> > single protein fragment peak I could neither get the models but an
> >> >> > error saying no data quitting that happens at the protein prophet
> >> >> > level.After the peptide prophet probabilities are observed on few
> >> >> > occasions to be in negative values."
>
> >> >> > So the problem is not in inability of Anand to convert data to mzXML
> >> >> > file but in inability of PeptideProphet to estimate peptide
> >> >> > probabilities. I was always amused by negative numbers for "peptide
> >> >> > probabilities" generated by PeptideProphet. How that is possible?
> >> >> > Probabilities are always positive numbers. But after a while I decided
> >> >> > that if somebody want to mark some arbitrary measures of goodness of
> >> >> > fit as "probabilities" so be it.
>
> >> >> > PeptideProphet systemically fails to produce meaningful results when
> >> >> > number of experimental observations is small. Which is, by the way,
> >> >> > often a case.
> >> >> > The dependence of "peptide probabilities" estimated by PeptideProphet
> >> >> > from the presence of other data is the other amusement provided by
> >> >> > PeptideProphet.
>
> >> >> > I do not mean to undermine the importance of this free software but
> >> >> > meaningful guide is required which will address issues and limitations
> >> >> > of TPP so that regular folks (like me for instance) would use it with
> >> >> > complete understanding what this software can and can not do.
>
> >> >> > Sergey
>
> >> >> > On Feb 17, 12:34 pm, Natalie Tasman <ntas...@systemsbiology.org>
> >> >> > wrote:
> >> >> >> Hello Arand,
>
> >> >> >> You'll need to start by converting the Shimadzu-specific instrument
> >> >> >> files to mzXML or mzML files in order to use the TPP tools.  The
> >> >> >> SPC/TPP tools do not support this directly, but someone posted to our
> >> >> >> list mentioning the Mass++ program which is advertised to do this:
>
> >> >> >>http://groups.google.com/group/spctools-discuss/browse_thread/thread/...
>
> >> >> >> Please let us know what your experiences are with this program,
>
> >> >> >> Natalie
>
> >> >> >> On Tue, Feb 17, 2009 at 3:52 AM, Research_anand 
> >> >> >> <anand1...@gmail.com> wrote:
>
> >> >> >> > Hi all ,
>
> >> >> >> > I have been trying to upload and analyze a single protein MSMS CID
> >> >> >> > data retrieved from shimadzu instrument and I am unable to run the
> >> >> >> > pipeline.The pipeline fails at the protein prophet level and cites
> >> >> >> > error message "No data quitting". I tried also converting mzXML 
> >> >> >> > files
> >> >> >> > to dta as well as mgf files through mzXML2Search but conversion was
> >> >> >> > impossible.
>
> >> >> >> > At this juncture I am unable to proceed further because reason I do
> >> >> >> > not know.Kindly suggest also any test data set that can be run on 
> >> >> >> > TPP
> >> >> >> > as a standard.
>
> >> >> >> > Is there any correction that I have to induce before running TPP on
> >> >> >> > the mzXML data file.
>
> >> >> >> > Regards & Thanks
>
> >> >> >> > Anand- Hide quoted text -
>
> >> - Show quoted text -
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to spctools-discuss@googlegroups.com
To unsubscribe from this group, send email to 
spctools-discuss+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

[spctools-discuss] Re: Reg:TPP combatibility with Shimadzu MALDI-TOF-TOF data

Reply via email to