[spctools-discuss] Re: Reg:TPP combatibility with Shimadzu MALDI-TOF-TOF data

David Shteynberg Tue, 17 Feb 2009 21:55:23 -0800

Thank you for this interesting discussion.  My original point was to
say that data collected on the same type of instrument and searched
against the same database with the same parameters can be analyzed
together in PeptideProphet.


I accept your point that in the case of the additional models that
PeptideProphet applies, specifically the models that deal with
features of identified peptides Number of Tolerable Termini, Number of
Missed Cleavages, etc. "leads (at some conditions) to strong
cross-talk between data and incorrect estimation of probabilities in
combined data set".  Especially, if there is reason to expect that the
quality of enzymatic digestion would vary between the two datasets.
However, if the searches are done in the same way against the same
database there is nothing that should stop one from combining
independent datasets collected on the same machine using models that
would not be expected to vary between the experiments being combined:
the search engine f-value model or the mass accuracy model which
models a feature of the instrument.


> "already "interact" with each other in real life just by virtue of the
> instrument itself"
> If you see that it is a time to call service and clear the instrument.
Core facilities run standards for this very reason.


-David

On Tue, Feb 17, 2009 at 5:39 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote:
>
> I disagree that
> "finding is that these datasets are not truly independent because as
> you note if they
> were truly independent datasets the probabilities would not change.
> Different datasets that are collected on the same instrument (or same
> type of instrument) already "interact" with each other in real life
> just by virtue of the instrument itself."
>
> Because what you actually see is an attempt of PeptideProphet to
> derive model at conditions when solution is metastable. This is sort
> of bifurcation point if we apply terminology of "catastrophe
> theory" (i do not known actual term in English). Therefore slightest
> deviation in initial conditions results in dramatic differences in
> final results and strongly link two presumably independent
> experiments. Sometimes "bad" data takes over and no probabilities will
> be assigned at all. This all stems from heuristic calibration in
> PeptideProphet which makes this tool vulnerable to variations in
> initials conditions when data are at the border of calibration
> diapason (that is true if algorithm follows published methods). Thus,
> solutions provided by PeptideProphet are only a measure of goodness of
> fit of the data to calibration data. These are not probabilities in
> mathematical sense, however, they will asymptotically approach to
> experimental probabilities derived from calibration data (again I
> assume that what was published is what we get). Experimental
> probabilities from calibration data presumably approximate true
> probabilities (which is not a fact to me also).
>
> As you could see there is a fundamental problem with PeptideProphet
> which leads (at some conditions) to strong cross-talk between data and
> incorrect estimation of probabilities in combined data set.
>
> "already "interact" with each other in real life just by virtue of the
> instrument itself"
> If you see that it is a time to call service and clear the instrument.
>
> Sergey
>
> On Feb 17, 6:18 pm, David Shteynberg <dshteynb...@systemsbiology.org>
> wrote:
>> I don't agree that any hidden links are introduced, rather some
>> incorrect assumptions are exposed.  When datasets are analyzed
>> together this is very explicit.  When you are expecting that these
>> datasets should be independent but you are finding that you get
>> different results when you are analyzing them together vs. apart and
>> you get different probabilities what you are actually finding is that
>> these datasets are not truly independent because as you note if they
>> were truly independent datasets the probabilities would not change.
>> Different datasets that are collected on the same instrument (or same
>> type of instrument) already "interact" with each other in real life
>> just by virtue of the instrument itself.
>>
>> -David
>>
>> On Tue, Feb 17, 2009 at 2:46 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote:
>>
>> > For two independent experimental data probability of particular scan
>> > to match particular peptide should not depend on data in second
>> > independent experiment that is why they are independent experiments.
>> > With PeptideProphet when you combine two data sets in interact.xml and
>> > process them together you will see that data from one data set will
>> > affect results derived from other data set. Thus, you introduce hidden
>> > link between two experiments which otherwise were not interacting with
>> > each other in real life. The correctness of this approach is
>> > questionable, although, you could always find an argument to do this
>> > mixing and even justify it from point view of mathematical statistic.
>>
>> > Sergey
>>
>> > On Feb 17, 4:17 pm, David Shteynberg <dshteynb...@systemsbiology.org>
>> > wrote:
>> >> PeptideProphet compute a probability that a PSM is correct GIVEN the
>> >> data.  So I am not sure I understand what you mean exactly when you
>> >> say that this is "mathematically wrong".  It is not correct to compare
>> >> probabilities on the same spectrum between different runs of
>> >> PeptideProphet when you give it different data.  However, we not
>> >> interested in probabilities themselves as much as we are in using the
>> >> probabilities for estimating FDRs, error rates and sensitivities at
>> >> different probability cutoffs.  As long as you are comparing FDRs,
>> >> error rates and sensitivities then it doesn't matter which data you
>> >> allow PeptideProphet to combine and model together.
>>
>> >> -David
>>
>> >> When computing this conditional probability if the data changes
>> >> obviously the probability of correctness will changes.
>>
>> >> On Tue, Feb 17, 2009 at 12:05 PM, mnotes <sdoro...@ms.cc.sunysb.edu> 
>> >> wrote:
>>
>> >> > Good advise David:
>>
>> >> > But,combining data from several independent analysis of similar
>> >> > samples does not save the day. In fact one "bad" (from point of view
>> >> > of PeptideProphet) msms-run can spoil the rest of data. And, in fact
>> >> > it does. It does not sound right to me when estimation of peptides
>> >> > probabilities in two independent experiments start to depend from each
>> >> > other. However, it is precisely what is happening with PeptideProphet.
>> >> > So whether, data combining as you suggested is good idea? Hard to say.
>> >> > Formal approach suggest that it is mathematically wrong. Common wisdom
>> >> > argue  who caries (just kidding).
>>
>> >> > Sergey
>>
>> >> > On Feb 17, 2:19 pm, David Shteynberg <dshteynb...@systemsbiology.org>
>> >> > wrote:
>> >> >> Hello Anand and Sergey,
>>
>> >> >> Negative values are placeholders to identify potentially correct PSMs
>> >> >> when there is too little data, insufficient separation between
>> >> >> positive and negative distributions in the model, or non-bimodal
>> >> >> search results (often a result of bad search parameters missing static
>> >> >> mod. etc).  PeptideProphet attempt to fish out those ids that are
>> >> >> above the background but assigns them negative codes equal to their
>> >> >> charge instead of a probability.  If you look in the pepXML file these
>> >> >> identifications should have an analysis="incomplete" inside the
>> >> >> peptideprophet_result tags.
>>
>> >> >> So solutions would include to double check your search parameters,
>> >> >> include decoys in your search database and include those when you run
>> >> >> PeptideProphet, combine your dataset with similar dataset to increase
>> >> >> sample size and improve the statistics.
>>
>> >> >> -David
>>
>> >> >> On Tue, Feb 17, 2009 at 10:35 AM, mnotes <sdoro...@ms.cc.sunysb.edu> 
>> >> >> wrote:
>>
>> >> >> > Dear Natalie:
>>
>> >> >> > From personal communication of Anand to me : "But after MSMS file for
>> >> >> > single protein fragment peak I could neither get the models but an
>> >> >> > error saying no data quitting that happens at the protein prophet
>> >> >> > level.After the peptide prophet probabilities are observed on few
>> >> >> > occasions to be in negative values."
>>
>> >> >> > So the problem is not in inability of Anand to convert data to mzXML
>> >> >> > file but in inability of PeptideProphet to estimate peptide
>> >> >> > probabilities. I was always amused by negative numbers for "peptide
>> >> >> > probabilities" generated by PeptideProphet. How that is possible?
>> >> >> > Probabilities are always positive numbers. But after a while I 
>> >> >> > decided
>> >> >> > that if somebody want to mark some arbitrary measures of goodness of
>> >> >> > fit as "probabilities" so be it.
>>
>> >> >> > PeptideProphet systemically fails to produce meaningful results when
>> >> >> > number of experimental observations is small. Which is, by the way,
>> >> >> > often a case.
>> >> >> > The dependence of "peptide probabilities" estimated by PeptideProphet
>> >> >> > from the presence of other data is the other amusement provided by
>> >> >> > PeptideProphet.
>>
>> >> >> > I do not mean to undermine the importance of this free software but
>> >> >> > meaningful guide is required which will address issues and 
>> >> >> > limitations
>> >> >> > of TPP so that regular folks (like me for instance) would use it with
>> >> >> > complete understanding what this software can and can not do.
>>
>> >> >> > Sergey
>>
>> >> >> > On Feb 17, 12:34 pm, Natalie Tasman <ntas...@systemsbiology.org>
>> >> >> > wrote:
>> >> >> >> Hello Arand,
>>
>> >> >> >> You'll need to start by converting the Shimadzu-specific instrument
>> >> >> >> files to mzXML or mzML files in order to use the TPP tools.  The
>> >> >> >> SPC/TPP tools do not support this directly, but someone posted to 
>> >> >> >> our
>> >> >> >> list mentioning the Mass++ program which is advertised to do this:
>>
>> >> >> >>http://groups.google.com/group/spctools-discuss/browse_thread/thread/...
>>
>> >> >> >> Please let us know what your experiences are with this program,
>>
>> >> >> >> Natalie
>>
>> >> >> >> On Tue, Feb 17, 2009 at 3:52 AM, Research_anand 
>> >> >> >> <anand1...@gmail.com> wrote:
>>
>> >> >> >> > Hi all ,
>>
>> >> >> >> > I have been trying to upload and analyze a single protein MSMS CID
>> >> >> >> > data retrieved from shimadzu instrument and I am unable to run the
>> >> >> >> > pipeline.The pipeline fails at the protein prophet level and cites
>> >> >> >> > error message "No data quitting". I tried also converting mzXML 
>> >> >> >> > files
>> >> >> >> > to dta as well as mgf files through mzXML2Search but conversion 
>> >> >> >> > was
>> >> >> >> > impossible.
>>
>> >> >> >> > At this juncture I am unable to proceed further because reason I 
>> >> >> >> > do
>> >> >> >> > not know.Kindly suggest also any test data set that can be run on 
>> >> >> >> > TPP
>> >> >> >> > as a standard.
>>
>> >> >> >> > Is there any correction that I have to induce before running TPP 
>> >> >> >> > on
>> >> >> >> > the mzXML data file.
>>
>> >> >> >> > Regards & Thanks
>>
>> >> >> >> > Anand- Hide quoted text -
>>
>> >> - Show quoted text -
>>
>>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to spctools-discuss@googlegroups.com
To unsubscribe from this group, send email to 
spctools-discuss+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

[spctools-discuss] Re: Reg:TPP combatibility with Shimadzu MALDI-TOF-TOF data

Reply via email to