Hi David: Thank you, David, for interesting conversation. The world you just described does not exits, fortunately I would say, and it is probably a time to think about real world which is outside of all-round but practically useless mathematical models. Sergey
On Feb 18, 12:55 am, David Shteynberg <dshteynb...@systemsbiology.org> wrote: > Thank you for this interesting discussion. My original point was to > say that data collected on the same type of instrument and searched > against the same database with the same parameters can be analyzed > together in PeptideProphet. > > I accept your point that in the case of the additional models that > PeptideProphet applies, specifically the models that deal with > features of identified peptides Number of Tolerable Termini, Number of > Missed Cleavages, etc. "leads (at some conditions) to strong > cross-talk between data and incorrect estimation of probabilities in > combined data set". Especially, if there is reason to expect that the > quality of enzymatic digestion would vary between the two datasets. > However, if the searches are done in the same way against the same > database there is nothing that should stop one from combining > independent datasets collected on the same machine using models that > would not be expected to vary between the experiments being combined: > the search engine f-value model or the mass accuracy model which > models a feature of the instrument. > > > "already "interact" with each other in real life just by virtue of the > > instrument itself" > > If you see that it is a time to call service and clear the instrument. > > Core facilities run standards for this very reason. > > -David > > > > On Tue, Feb 17, 2009 at 5:39 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote: > > > I disagree that > > "finding is that these datasets are not truly independent because as > > you note if they > > were truly independent datasets the probabilities would not change. > > Different datasets that are collected on the same instrument (or same > > type of instrument) already "interact" with each other in real life > > just by virtue of the instrument itself." > > > Because what you actually see is an attempt of PeptideProphet to > > derive model at conditions when solution is metastable. This is sort > > of bifurcation point if we apply terminology of "catastrophe > > theory" (i do not known actual term in English). Therefore slightest > > deviation in initial conditions results in dramatic differences in > > final results and strongly link two presumably independent > > experiments. Sometimes "bad" data takes over and no probabilities will > > be assigned at all. This all stems from heuristic calibration in > > PeptideProphet which makes this tool vulnerable to variations in > > initials conditions when data are at the border of calibration > > diapason (that is true if algorithm follows published methods). Thus, > > solutions provided by PeptideProphet are only a measure of goodness of > > fit of the data to calibration data. These are not probabilities in > > mathematical sense, however, they will asymptotically approach to > > experimental probabilities derived from calibration data (again I > > assume that what was published is what we get). Experimental > > probabilities from calibration data presumably approximate true > > probabilities (which is not a fact to me also). > > > As you could see there is a fundamental problem with PeptideProphet > > which leads (at some conditions) to strong cross-talk between data and > > incorrect estimation of probabilities in combined data set. > > > "already "interact" with each other in real life just by virtue of the > > instrument itself" > > If you see that it is a time to call service and clear the instrument. > > > Sergey > > > On Feb 17, 6:18 pm, David Shteynberg <dshteynb...@systemsbiology.org> > > wrote: > >> I don't agree that any hidden links are introduced, rather some > >> incorrect assumptions are exposed. When datasets are analyzed > >> together this is very explicit. When you are expecting that these > >> datasets should be independent but you are finding that you get > >> different results when you are analyzing them together vs. apart and > >> you get different probabilities what you are actually finding is that > >> these datasets are not truly independent because as you note if they > >> were truly independent datasets the probabilities would not change. > >> Different datasets that are collected on the same instrument (or same > >> type of instrument) already "interact" with each other in real life > >> just by virtue of the instrument itself. > > >> -David > > >> On Tue, Feb 17, 2009 at 2:46 PM, mnotes <sdoro...@ms.cc.sunysb.edu> wrote: > > >> > For two independent experimental data probability of particular scan > >> > to match particular peptide should not depend on data in second > >> > independent experiment that is why they are independent experiments. > >> > With PeptideProphet when you combine two data sets in interact.xml and > >> > process them together you will see that data from one data set will > >> > affect results derived from other data set. Thus, you introduce hidden > >> > link between two experiments which otherwise were not interacting with > >> > each other in real life. The correctness of this approach is > >> > questionable, although, you could always find an argument to do this > >> > mixing and even justify it from point view of mathematical statistic. > > >> > Sergey > > >> > On Feb 17, 4:17 pm, David Shteynberg <dshteynb...@systemsbiology.org> > >> > wrote: > >> >> PeptideProphet compute a probability that a PSM is correct GIVEN the > >> >> data. So I am not sure I understand what you mean exactly when you > >> >> say that this is "mathematically wrong". It is not correct to compare > >> >> probabilities on the same spectrum between different runs of > >> >> PeptideProphet when you give it different data. However, we not > >> >> interested in probabilities themselves as much as we are in using the > >> >> probabilities for estimating FDRs, error rates and sensitivities at > >> >> different probability cutoffs. As long as you are comparing FDRs, > >> >> error rates and sensitivities then it doesn't matter which data you > >> >> allow PeptideProphet to combine and model together. > > >> >> -David > > >> >> When computing this conditional probability if the data changes > >> >> obviously the probability of correctness will changes. > > >> >> On Tue, Feb 17, 2009 at 12:05 PM, mnotes <sdoro...@ms.cc.sunysb.edu> > >> >> wrote: > > >> >> > Good advise David: > > >> >> > But,combining data from several independent analysis of similar > >> >> > samples does not save the day. In fact one "bad" (from point of view > >> >> > of PeptideProphet) msms-run can spoil the rest of data. And, in fact > >> >> > it does. It does not sound right to me when estimation of peptides > >> >> > probabilities in two independent experiments start to depend from each > >> >> > other. However, it is precisely what is happening with PeptideProphet. > >> >> > So whether, data combining as you suggested is good idea? Hard to say. > >> >> > Formal approach suggest that it is mathematically wrong. Common wisdom > >> >> > argue who caries (just kidding). > > >> >> > Sergey > > >> >> > On Feb 17, 2:19 pm, David Shteynberg <dshteynb...@systemsbiology.org> > >> >> > wrote: > >> >> >> Hello Anand and Sergey, > > >> >> >> Negative values are placeholders to identify potentially correct PSMs > >> >> >> when there is too little data, insufficient separation between > >> >> >> positive and negative distributions in the model, or non-bimodal > >> >> >> search results (often a result of bad search parameters missing > >> >> >> static > >> >> >> mod. etc). PeptideProphet attempt to fish out those ids that are > >> >> >> above the background but assigns them negative codes equal to their > >> >> >> charge instead of a probability. If you look in the pepXML file > >> >> >> these > >> >> >> identifications should have an analysis="incomplete" inside the > >> >> >> peptideprophet_result tags. > > >> >> >> So solutions would include to double check your search parameters, > >> >> >> include decoys in your search database and include those when you run > >> >> >> PeptideProphet, combine your dataset with similar dataset to increase > >> >> >> sample size and improve the statistics. > > >> >> >> -David > > >> >> >> On Tue, Feb 17, 2009 at 10:35 AM, mnotes <sdoro...@ms.cc.sunysb.edu> > >> >> >> wrote: > > >> >> >> > Dear Natalie: > > >> >> >> > From personal communication of Anand to me : "But after MSMS file > >> >> >> > for > >> >> >> > single protein fragment peak I could neither get the models but an > >> >> >> > error saying no data quitting that happens at the protein prophet > >> >> >> > level.After the peptide prophet probabilities are observed on few > >> >> >> > occasions to be in negative values." > > >> >> >> > So the problem is not in inability of Anand to convert data to > >> >> >> > mzXML > >> >> >> > file but in inability of PeptideProphet to estimate peptide > >> >> >> > probabilities. I was always amused by negative numbers for "peptide > >> >> >> > probabilities" generated by PeptideProphet. How that is possible? > >> >> >> > Probabilities are always positive numbers. But after a while I > >> >> >> > decided > >> >> >> > that if somebody want to mark some arbitrary measures of goodness > >> >> >> > of > >> >> >> > fit as "probabilities" so be it. > > >> >> >> > PeptideProphet systemically fails to produce meaningful results > >> >> >> > when > >> >> >> > number of experimental observations is small. Which is, by the way, > >> >> >> > often a case. > >> >> >> > The dependence of "peptide probabilities" estimated by > >> >> >> > PeptideProphet > >> >> >> > from the presence of other data is the other amusement provided by > >> >> >> > PeptideProphet. > > >> >> >> > I do not mean to undermine the importance of this free software but > >> >> >> > meaningful guide is required which will address issues and > >> >> >> > limitations > >> >> >> > of TPP so that regular folks (like me for instance) would use it > >> >> >> > with > >> >> >> > complete understanding what this software can and can not do. > > >> >> >> > Sergey > > >> >> >> > On Feb 17, 12:34 pm, Natalie Tasman <ntas...@systemsbiology.org> > >> >> >> > wrote: > >> >> >> >> Hello Arand, > > >> >> >> >> You'll need to start by converting the Shimadzu-specific > >> >> >> >> instrument > >> >> >> >> files to mzXML or mzML files in > > ... > > read more »- Hide quoted text - > > - Show quoted text - --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to spctools-discuss@googlegroups.com To unsubscribe from this group, send email to spctools-discuss+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en -~----------~----~----~----~------~----~------~--~---