Hi Jagan, My intuition suggests that it is advisable to use a consistent database and similar search parameters whenever possible when combining search engines with iProphet. I don't really have anything to offer here other than my intuition since I have not actually tried this. We always now include decoys in the databases we use. I suspect the statistical models used by iProphet may not apply equally to spectral matches coming from different search engines, and may create bias in the results.
-David On Fri, Jul 16, 2010 at 9:42 PM, Jagan Kommineni <[email protected]> wrote: > Dear David, > > At APCF, we are using multiple algorithms (Mascot, X!tandem and > OMSSA currently and Crux in the near future) for processing same data. The > standard search reasults are passed over to TPP for the postprocessing. > After PeptideProhet parser runs, we are combining results into one output > file by merging results files using your iProphet. The result file of > iProphet is further processed with proteinProphet parser. We are also > planning to quantify peptides produced by iProphet by using your tools ASAP, > XPRESS and Libra before running ProteinProphet parser. > > I would like to know over all impact of the results when we combine > the non-decoy peptideProphet results for the Mascot and X!tandem with decoy > based peptideProphet results for the OMSSA in running iProphet and other > postprocessing TPP tools (ex. quantitation and proteinProphetParser). We are > really keen to know more information in this space. > > If you think it is advisable to use consistent decoy database across > all the algorithms (Mascot, X!tandem, OMSSA and Crux) if user wants to > include OMSSA search results in the postprocessing TPP task, could you mind > to advise us? > > with regards, > > Jagan Kommineni > > > On Sat, Jul 17, 2010 at 1:01 AM, David Shteynberg > <[email protected]> wrote: >> >> Dear Jagan, >> >> The modelling for OMSSA (also Inspect and Myrimatch) is done with >> semi-parametric modeling which *requires* decoys to learn the shapes >> of the mixture model distributions. Without decoys in the database >> these search engines cannot be processed through the TPP. Why are you >> reluctant to include decoys in the model? We sometimes use two >> independent sets of decoys in the database, where one set is used for >> the semi-parametric modelling and the other to independently evaluate >> the model against another decoy set. Also, a match that is >> significant is not necessarily correct, decoy matches with significant >> scores are common. >> >> -David >> >> On Fri, Jul 16, 2010 at 12:03 AM, Jagan Kommineni >> <[email protected]> wrote: >> > Dear David, >> > >> > After increasing the e-value to 1e6, I run the OMSSA search >> > with >> > standard fasta and decoy databases (generated using TPP's decoyFASTA) >> > with >> > identical input data and parameters. >> > >> > In the first case (non-decoy) I got 101 of 18,872 peptide matches are >> > significant and in the latter case (decoy), I found 89 of 18,012 peptide >> > matches are significant. >> > >> > I have used same input file in both experiments which is having 3,444 >> > spectras. >> > >> > When I run PeptideProphetParser against non-decoy dayabase I got >> > Segmentation fault eventhough in both cases return similar set of >> > results >> > (similar set of false positives) from the standard OMSSA search. Here is >> > the >> > STDOUT, PeptideProphetParser run for NON-DECOY (standard fasta >> > database). >> > >> > >> > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> > >> > [r...@compute-3-0 2010-07-16]# >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/InteractParser >> > 'jagan-J1229.pepprophet.xml' >> > 'jagan-J1229.pep.xml' -L'7' -E'trypsin' -C -P >> > file 1: jagan-J1229.pep.xml >> > processed altogether 3635 results >> > >> > results written to file >> > /mnt/sanfs/APCF/results/tpp/2010-07-16/jagan-J1229.pepprophet.shtml >> > >> > [r...@compute-3-0 2010-07-16]# >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/PeptideProphetParser >> > 'jagan-J1229.pepprophet.xml' DECOY=decoy MINPROB=0 NONPARAM >> > Using Decoy Label "decoy". >> > Using non-parametric distributions >> > (OMSSA) (minprob 0) >> > WARNING!! The discriminant function for OMSSA is not yet complete. It >> > is >> > presented here to help facilitate trial and discussion. Reliance on >> > this >> > code for publishable scientific results is not recommended. >> > init with OMSSA trypsin >> > MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: >> > UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN >> > >> > PeptideProphet (TPP v4.4 JETSTREAM (unstable development prerelease) >> > rev >> > 0, Build 201007011135 (linux)) akel...@isb >> > read in 272 1+, 1490 2+, 1766 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. >> > Initialising statistical models ... >> > Found 0 Decoys, and 3528 Non-Decoys >> > WARNING: No decoys with label decoy were found in this dataset. >> > reverting to >> > fully unsupervised method. >> > Iterations: .........10.........20 >> > Segmentation fault >> > [r...@compute-3-0 2010-07-16]# >> > >> > >> > -------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> > As mentioned in the latter case where I use decoy database for the OMSSA >> > search, PeptideProphetParser issues only the warning messages but >> > finally I >> > can able to view pepXML files without any hassle. But similar type of >> > input >> > file when I run TPP pipeline after standard mascot search, I see 0 hits >> > for >> > changes 4, ,5, 6 and 7 but no warning messages. >> > >> > >> > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> > [r...@compute-3-0 2010-07-16]# >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/InteractParser >> > 'jagan-J237.pepprophet.xml' >> > 'jagan-J237.pep.xml' -L'7' -E'trypsin' -C -P >> > file 1: jagan-J237.pep.xml >> > processed altogether 3417 results >> > >> > results written to file >> > /mnt/sanfs/APCF/results/tpp/2010-07-16/jagan-J237.pepprophet.shtml >> > >> > [r...@compute-3-0 2010-07-16]# >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/PeptideProphetParser >> > 'jagan-J237.pepprophet.xml' DECOY=decoy MINPROB=0 NONPARAM >> > Using Decoy Label "decoy". >> > Using non-parametric distributions >> > (OMSSA) (minprob 0) >> > WARNING!! The discriminant function for OMSSA is not yet complete. It >> > is >> > presented here to help facilitate trial and discussion. Reliance on >> > this >> > code for publishable scientific results is not recommended. >> > init with OMSSA trypsin >> > MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: >> > UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN >> > >> > PeptideProphet (TPP v4.4 JETSTREAM (unstable development prerelease) >> > rev >> > 0, Build 201007011135 (linux)) akel...@isb >> > read in 213 1+, 1376 2+, 1766 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. >> > Initialising statistical models ... >> > Found 997 Decoys, and 2358 Non-Decoys >> > Iterations: .........10.........20..... >> > WARNING: Mixture model quality test failed for charge (1+). >> > WARNING: Mixture model quality test failed for charge (4+). >> > WARNING: Mixture model quality test failed for charge (5+). >> > WARNING: Mixture model quality test failed for charge (6+). >> > WARNING: Mixture model quality test failed for charge (7+). >> > model complete after 26 iterations >> > [r...@compute-3-0 2010-07-16]# >> > >> > ------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> > I wonder is there anyway, I can run TPP on the OMSSA results produced by >> > standard fasta database rather than decoy database. >> > >> > I kept the OMSSA search result files on APCF wiki and here is the link >> > ... >> > >> > >> > >> > https://search.apcf.edu.au/wiki/index.php/Apcfwiki:Community_Portal#APCF__OMSSA_files >> > >> > OMSSA files (jagan-J1229.omx non-decoy output and jagan-J237.omx decoy >> > output and O070512-01.mgf input file >> > >> > with regards, >> > >> > >> > Jagan Kommineni >> > >> > >> > >> > On Fri, Jul 9, 2010 at 7:18 AM, David Shteynberg >> > <[email protected]> wrote: >> >> >> >> Hi Jagan, >> >> >> >> It appears that the QC filters were triggered on the PeptideProphet >> >> MixtureModel. This is likely due to too few data points in the >> >> analysis for good stats: read in 0 1+, 82 2+, 44 3+, 0 4+, 0 5+, 0 >> >> 6+, and 0 7+ spectra. >> >> >> >> >> >> With OMSSA this could be due to too low e-value setting which filters >> >> out many results which the model can utilize to better model the >> >> negative and positive distributions. Set your OMSSA e-value to a high >> >> value like 1e6 and this problem will likely go away. Unless you don't >> >> have very many correct results due to wrong parameters or bad data or >> >> something else. >> >> >> >> Hope this helps. >> >> >> >> -David >> >> >> >> >> >> On Fri, Jul 2, 2010 at 1:02 AM, Jagan Kommineni >> >> <[email protected]> wrote: >> >> > Dear All, >> >> > >> >> > I have created decoy database for the SwisPlot database using >> >> > decoyFASTA >> >> > of >> >> > the TPPDistribution and run the following TPP commands after the >> >> > omssa >> >> > search with decoy database and here is the output on STDOUT ... >> >> > >> >> > --------------------------------------- >> >> > [r...@compute-3-0 run-on-compute]# >> >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/InteractParser >> >> > 'jagan-J128.pepprophet.xml' >> >> > 'jagan-J128.pep.xml' >> >> > >> >> > >> >> > -D'/home/APCF/databases/SwissProt/uniprot_sprot_Jan2009/decoy/decoy_uniprot_sprot.fasta' >> >> > -L'7' -E'trypsin' -C -P >> >> > file 1: jagan-J128.pep.xml >> >> > processed altogether 126 results >> >> > >> >> > >> >> > results written to file >> >> > >> >> > >> >> > /mnt/sanfs/APCF/results/omssa/decoy_test_run/run-on-compute/jagan-J128.pepprophet.shtml >> >> > >> >> > >> >> > >> >> > [r...@compute-3-0 run-on-compute]# >> >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/PeptideProphetParser >> >> > 'jagan-J128.pepprophet.xml' DECOY=decoy MINPROB=0 NONPARAM >> >> > Using Decoy Label "decoy". >> >> > Using non-parametric distributions >> >> > (OMSSA) (minprob 0) >> >> > WARNING!! The discriminant function for OMSSA is not yet complete. >> >> > It >> >> > is >> >> > presented here to help facilitate trial and discussion. Reliance on >> >> > this >> >> > code for publishable scientific results is not recommended. >> >> > init with OMSSA Trypsin >> >> > MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, >> >> > Ionization: >> >> > UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN >> >> > >> >> > PeptideProphet (TPP v4.3 JETSTREAM rev 1, Build 201003241044 >> >> > (linux)) >> >> > akel...@isb >> >> > read in 0 1+, 82 2+, 44 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. >> >> > Initialising statistical models ... >> >> > Iterations: .........10.........20..... >> >> > WARNING: Mixture model quality test failed for charge (1+). >> >> > WARNING: Mixture model quality test failed for charge (2+). >> >> > WARNING: Mixture model quality test failed for charge (4+). >> >> > WARNING: Mixture model quality test failed for charge (5+). >> >> > WARNING: Mixture model quality test failed for charge (6+). >> >> > WARNING: Mixture model quality test failed for charge (7+). >> >> > model complete after 26 iterations >> >> > [r...@compute-3-0 run-on-compute]# >> >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/RefreshParser >> >> > 'jagan-J128.pepprophet.xml' >> >> > >> >> > >> >> > '/home/APCF/databases/SwissProt/uniprot_sprot_Jan2009/decoy/decoy_uniprot_sprot.fasta' >> >> > - Building Commentz-Walter keyword tree... - Searching the tree... >> >> > - Linking duplicate entries... - Printing results... >> >> > >> >> > [r...@compute-3-0 run-on-compute]# >> >> > /mnt/sanfs/APCF/APCF_WEB/tpp/bin/ProteinProphet >> >> > 'jagan-J128.pepprophet.xml' >> >> > 'jagan-J128.prot.xml' >> >> > ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the >> >> > original Perl by A. Keller (TPP v4.3 JETSTREAM rev 1, Build >> >> > 201003241044 >> >> > (linux)) >> >> > (xml input) (report Protein Length) (using degen pep info) >> >> > . . . reading in >> >> > >> >> > >> >> > /mnt/sanfs/APCF/results/omssa/decoy_test_run/run-on-compute/jagan-J128.pepprophet.xml. >> >> > . . >> >> > . . . read in 0 1+, 0 2+, 33 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with >> >> > min >> >> > prob 0.05 >> >> > Could not find/open font when opening font "arial", using internal >> >> > non-scalable font >> >> > INFO: mu=6.3014e-09, db_size=584667857 >> >> > >> >> > protein probabilities written to file >> >> > >> >> > >> >> > /mnt/sanfs/APCF/results/omssa/decoy_test_run/run-on-compute/jagan-J128.prot.xml >> >> > direct your browser to >> >> > >> >> > >> >> > http://nfs//mnt/sanfs/APCF/results/omssa/decoy_test_run/run-on-compute/jagan-J128.prot.shtml >> >> > >> >> > [r...@compute-3-0 run-on-compute]# >> >> > >> >> > ------------------------------------- >> >> > >> >> > I noticed there are some warning messages indicating some tests are >> >> > failed, >> >> > how critical are these messages. >> >> > >> >> > >> >> > with regards, >> >> > >> >> > >> >> > Dr. Jagan Kommineni >> >> > Ludwig Institute for Cancer research >> >> > Pakville VIC 3145 >> >> > Australia. >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> > Groups >> >> > "spctools-discuss" group. >> >> > To post to this group, send email to >> >> > [email protected]. >> >> > To unsubscribe from this group, send email to >> >> > [email protected]. >> >> > For more options, visit this group at >> >> > http://groups.google.com/group/spctools-discuss?hl=en. >> >> > >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> >> Groups >> >> "spctools-discuss" group. >> >> To post to this group, send email to [email protected]. >> >> To unsubscribe from this group, send email to >> >> [email protected]. >> >> For more options, visit this group at >> >> http://groups.google.com/group/spctools-discuss?hl=en. >> >> >> > >> > >> > >> > -- >> > Dr. Jagan Kommineni >> > Ludwig Institute for Cancer research >> > Pakville VIC 3145 >> > Australia. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "spctools-discuss" group. >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group at >> > http://groups.google.com/group/spctools-discuss?hl=en. >> > >> >> -- >> You received this message because you are subscribed to the Google Groups >> "spctools-discuss" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/spctools-discuss?hl=en. >> > > > > -- > Dr. Jagan Kommineni > Ludwig Institute for Cancer research > Pakville VIC 3145 > Australia. > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
