Hi Andreas, I suspect this problem was only with your sequest search because of the offending spectrum I pointed out. I've corrected the source of PeptideProphet so it will in the future simply spit out a warning and ignore spectra where the encoded charge doesn't match assumed charge. For now you can get your sequest analysis to go through correctly by removing the offending spectrum. I need to debug the pepXML tool that generated the file to solve the underlying converter issue. Do you have the .out files available?
Thanks, -David On Wed, Jun 2, 2010 at 3:37 PM, Andreas Quandt <[email protected]> wrote: > hey david, > there is another question, i would like to ask: when trying to trace the > problem we also analyzed these mzxml files with other programs such as > xtandem, mascot, and omssa. by looking at their results we did not oberserve > any suspicious behavior. hence, do you think the problem you described does > only occur when using sequest or is this a more general problem? > cheers, > andreas > > On Thu, Jun 3, 2010 at 12:24 AM, Andreas Quandt <[email protected]> > wrote: >> >> hey david, >> many thanks for looking into it! >> i was wondering if you could point me to a source where conditions such as >> not using a '.' character in the file's basename are described because i was >> not aware of that. >> many thanks in advance, >> andreas >> >> On Wed, Jun 2, 2010 at 12:17 AM, David Shteynberg >> <[email protected]> wrote: >>> >>> Hello Ludovic, >>> >>> I think the problem might be in the pepXML generated in your pipeline. >>> The offending entry is in the file 20100422_04_control_07.c.pep.xml : >>> >>> >>> <spectrum_query spectrum="20100422_04_control_07.c.10165.10165.10" >>> start_scan="10165" end_scan="10165" precursor_neutral_mass="3602.2759" >>> assumed_charge="1" index="307" retention_time_sec="5514.3"> >>> >>> >>> Looks like the assumed charge opf 1+ doesn't match the encoded charge >>> in the spectrum name (the number after the last dot "10" in this >>> case). >>> >>> If you simply delete this entry from the file the fvalues and >>> probabilities should line up again. I will change the code to error >>> out when this scenario is encountered in the future. >>> >>> >>> To address the underlying issue it looks like the conversion to pepXML >>> failed on this entry for some reason. How were the pepXML files >>> generated? Can you send me the original .out files so I can test the >>> Out2XML sequest pepXML converter tool. This could also be caused by >>> the '.' character in the basename of your files. Only alphanumeric >>> and underscore characters are allowed there traditionally. >>> >>> Thanks, >>> -David >>> >>> On Mon, May 31, 2010 at 3:37 PM, David Shteynberg >>> <[email protected]> wrote: >>> > Hello Ludovic, >>> > >>> > Yes, I see that there is a difference in the output files. I think >>> > the problem is that the output of probabilities and fvals is >>> > misaligned from the spectra in the pep.xml file. You'll see that the >>> > next spectrum has the correct value. This definitely points to a bug >>> > in the 4.3 version that is quite dated now. We are close to releasing >>> > a new version. Are you able to test the latest SVN trunk version of >>> > the software on your system to check if the bug is still present >>> > there? >>> > >>> > Thanks, >>> > -David >>> > >>> > On Mon, May 31, 2010 at 1:53 AM, lgillet <[email protected]> >>> > wrote: >>> >> Hi David, >>> >> >>> >> No, I confirm again that I do find a difference upon running the >>> >> xinteract command with the files in different orders (I confirm I also >>> >> see those differences on TPP 4.3.1 installed on Unix and on >>> >> WindowsXP). >>> >> I have re-run the command on the same folder, on the same files, to >>> >> avoid any confusion about file names or so. The result of the first >>> >> command was named NormalOrder, and the second ScrambledOrder. >>> >> I have made another zip file (xinteract-output.zip) with the output of >>> >> my commands (such that you can check if I do anything wrong by running >>> >> the command). Also at the end of the text file, I have copy-pasted >>> >> some line from the summary visualisation from the web interface, with >>> >> different filter criteria. >>> >> I also attached in the zip the output of the 2 interact.pep.xml. >>> >> If you look at a diff between both files, you will find plenty of >>> >> differences on some spectra. For example, you can have a look at: >>> >> >>> >> 20100422_01_control_04.c.07416.07416.3 => fval = 4.88 Vs. 0.058 (in >>> >> NormalOrder Vs. ScrambledOrder resp.) >>> >> 20100422_01_control_05.c.08545.08545.3 => fval = 5.36 Vs. completely >>> >> absent (in NormalOrder Vs. ScrambledOrder resp.) >>> >> >>> >> Finally, could you please run those 4 pep.xml files on your server on >>> >> TPP 4.0 and TPP 4.3 by yourself? >>> >> You may realize by yourself then that there are not only "some >>> >> differeces" but the differences (especially if you look at the decoy >>> >> protein hits) is terrible. >>> >> >>> >> Thanks and let me know if something is still not clear. >>> >> >>> >> Ludovic >>> >> >>> >> On May 27, 7:33 pm, David Shteynberg <[email protected]> >>> >> wrote: >>> >>> Hi Ludovic, >>> >>> >>> >>> It is completely normal to expect some difference in the results >>> >>> between version of the software since the models maybe slightly >>> >>> different in new a version due to optimization, bug fixes and the >>> >>> sort. Hopefully the new analysis is able to increase your correct >>> >>> identifications at a set error rate. >>> >>> >>> >>> When I run your data through our 4.3.1 pipeline I get your result in >>> >>> the scrambled analysis (regardless of the order in which I specify my >>> >>> input pepxml files). The difference in *your* two analyses is due to >>> >>> the difference in your input files. Here is the relevant info from >>> >>> your two 4.3.1 analyses: >>> >>> >>> >>> interact-TPP-V4.3.pep.xml has 8931 spectra in charge 2+ that it >>> >>> models: >>> >>> >>> >>> <mixture_model precursor_ion_charge="2" comments="using no. tolerable >>> >>> trypsin term. [ntt] 0 data as pseudonegatives" >>> >>> prior_probability="0.427" est_tot_correct="3830.1" >>> >>> tot_num_spectra="8931" num_iterations="28"> >>> >>> >>> >>> interact_TPP-V4.3_scrambled.pep.xml has 8929 spectra in charge 2+ >>> >>> that >>> >>> it models: >>> >>> >>> >>> <mixture_model precursor_ion_charge="2" comments="using no. tolerable >>> >>> trypsin term. [ntt] 0 data as pseudonegatives" >>> >>> prior_probability="0.427" est_tot_correct="3829.1" >>> >>> tot_num_spectra="8929" num_iterations="28"> >>> >>> >>> >>> Since the inputs are different in the two analyses the results will >>> >>> be >>> >>> different. Please verify that the inputs your are giving to the two >>> >>> analyses in different order are *not identical*. Can you verify >>> >>> this? >>> >>> >>> >>> Thanks, >>> >>> -David >>> >>> >>> >>> On Thu, May 27, 2010 at 8:12 AM, lgillet <[email protected]> >>> >>> wrote: >>> >>> > Hi David, >>> >>> >>> >>> > TPP is installed in different servers in our Institute. I have re- >>> >>> > uploaded a new file (lgillet_interact-again.zip) for which the TPP >>> >>> > xinteract was performed on the same server and with different >>> >>> > versions >>> >>> > of the TPP. You can see that the results are still very different, >>> >>> > even the scrambled case. >>> >>> > Note that I used the version TPP v4.3 JETSTREAM rev 1, Build >>> >>> > 201004201202 (linux); which I do not know if it is the same as the >>> >>> > SVN >>> >>> > TPP that you mentioned or a "nightly-built". >>> >>> > - Can you re-confirm my results using your installation of TPP with >>> >>> > my >>> >>> > 4 pep.xml files? >>> >>> > - Can you re-confirm the differences in decoy % using your >>> >>> > installation of TPP between TPP V4.0 and V4.3 with my 4 pep.xml >>> >>> > files? >>> >>> > Thanks again, >>> >>> > Ludovic >>> >>> >>> >>> > On May 26, 9:01 pm, David Shteynberg >>> >>> > <[email protected]> >>> >>> > wrote: >>> >>> >> Hi Ludovic, >>> >>> >>> >>> >> I was unable to duplicate the different results on different order >>> >>> >> of >>> >>> >> input using the latest version of SVN tpp or version 4.3.1. I >>> >>> >> noticed >>> >>> >> that your two analyses point to different locations. Are you sure >>> >>> >> that the files at these locations are identical? >>> >>> >>> >>> >> Thanks, >>> >>> >> -David >>> >>> >>> >>> >> On Wed, May 26, 2010 at 10:47 AM, lgillet >>> >>> >> <[email protected]> wrote: >>> >>> >> > Hi David, >>> >>> >>> >>> >> > all my apologizes, the rar file got corrupted probably during >>> >>> >> > the >>> >>> >> > upload (the original on my HD was fine). >>> >>> >> > I have uploaded again a zip file this time: >>> >>> >> > lgillet_pepxml-again2.zip >>> >>> >> > I hope that works this time (after download, I can decompress it >>> >>> >> > back). >>> >>> >> > Thanks for having a look at this issue. >>> >>> >> > Best, >>> >>> >> > Ludovic >>> >>> >>> >>> >> > On May 25, 7:24 pm, David Shteynberg >>> >>> >> > <[email protected]> >>> >>> >> > wrote: >>> >>> >> >> Hi Ludovic, >>> >>> >>> >>> >> >> It seems the file you uploaded lgillet_pepxml_for_TPP4.3.rar is >>> >>> >> >> corrupted. At least I am unable to open it. Please upload >>> >>> >> >> again. >>> >>> >>> >>> >> >> Thanks, >>> >>> >> >> -David >>> >>> >>> >>> >> >> On Wed, May 19, 2010 at 2:54 AM, lgillet >>> >>> >> >> <[email protected]> wrote: >>> >>> >> >> > Hi David, Hi Natalie, >>> >>> >>> >>> >> >> > I just posted the 4 pepxml files which give me the most >>> >>> >> >> > striking >>> >>> >> >> > differences in results between TPP-V4.0 and TPP-V4.3: >>> >>> >> >> > lgillet_pepxml_for_TPP4.3.rar. I also posted the results >>> >>> >> >> > (interact.pep.xml) which I obtain from running TPP-V4.0, >>> >>> >> >> > TPP-V4.3 and >>> >>> >> >> > TPP-V4.3 on scrambled file order (file #4>#3>#2>#1): >>> >>> >> >> > lgillet_interact- >>> >>> >> >> > results.rar. >>> >>> >> >> > I really tried my best to figure out what the problem could >>> >>> >> >> > be. >>> >>> >> >> > Maybe you could re-run the same analyses (TPP-V4.0, TPP-V4.3, >>> >>> >> >> > TPP-V4.3 >>> >>> >> >> > with the scrambled file order) and let me know if you confirm >>> >>> >> >> > my >>> >>> >> >> > results or if there is something wrong maybe with the >>> >>> >> >> > compiled version >>> >>> >> >> > we have on our server (could still be a possibility). >>> >>> >> >> > Finally, to answer Natalie's question, the differences are >>> >>> >> >> > quite >>> >>> >> >> > dramatic (to my opinion) between V4.0 and V4.3 (I would not >>> >>> >> >> > have >>> >>> >> >> > worried about 1-2% differences in IDs), but here, I am >>> >>> >> >> > passing from 1% >>> >>> >> >> > decoy (V4.0) to 23% decoy (V4.3) hits (at the same proba > >>> >>> >> >> > 0.9). Also >>> >>> >> >> > the number of unique peptides reported by V4.0 and V4.3 is >>> >>> >> >> > quite >>> >>> >> >> > different (2150 and 3161 resp.). Finally, many decoy hits >>> >>> >> >> > pulled up in >>> >>> >> >> > V4.3 with a prob>0.9 have actually a very bad MS/MS spectrum >>> >>> >> >> > and a >>> >>> >> >> > very low prob<0.01 (only reported if you use -p0 option) on >>> >>> >> >> > V4.0. >>> >>> >>> >>> >> >> > Have a look at those MS/MS spectra for example: >>> >>> >>> >>> >> >> > 20100422_04_control_07.c.07700.07700.4 >>> >>> >> >> > 20100422_04_control_07.c.02864.02864.3 >>> >>> >>> >>> >> >> > Let me know if you need any extra information. >>> >>> >>> >>> >> >> > Thanks a lot for your help on that. >>> >>> >>> >>> >> >> > Best, >>> >>> >>> >>> >> >> > Ludovic >>> >>> >>> >>> >> >> > On May 18, 11:21 pm, Natalie Tasman >>> >>> >> >> > <[email protected]> >>> >>> >> >> > wrote: >>> >>> >> >> >> Ludovic, >>> >>> >>> >>> >> >> >> Go ahead and post the files to the newsgroup's file area >>> >>> >> >> >> (http://groups.google.com/group/spctools-discuss/files), and >>> >>> >> >> >> hopefully >>> >>> >> >> >> one of the validation experts will take a look. >>> >>> >>> >>> >> >> >> I will point out that PeptideProphet uses random >>> >>> >> >> >> initialization for >>> >>> >> >> >> it's curve fitting (EM algorithm). So it's not out of the >>> >>> >> >> >> question >>> >>> >> >> >> that you'd see some small differences between runs on the >>> >>> >> >> >> same data >>> >>> >> >> >> files, regardless of the order. Can you provide some >>> >>> >> >> >> measure of the >>> >>> >> >> >> differences between runs for the reordered datasets? >>> >>> >>> >>> >> >> >> -Natalie >>> >>> >>> >>> >> >> >> On Tue, May 18, 2010 at 4:35 AM, lgillet >>> >>> >> >> >> <[email protected]> wrote: >>> >>> >> >> >> > Hi everybody, >>> >>> >> >> >> > I recently encountered a "bug" I think when people in my >>> >>> >> >> >> > lab installed >>> >>> >> >> >> > the newest TPP (v4.3 JETSTREAM rev 1, Build 201004201202 >>> >>> >> >> >> > (linux)), >>> >>> >> >> >> > especially when I try to confront the result to v4.0 which >>> >>> >> >> >> > was our >>> >>> >> >> >> > former "benchmark" version. >>> >>> >> >> >> > When searching the same 4 pep.xml files with v4.0 and >>> >>> >> >> >> > v4.3, I get an >>> >>> >> >> >> > incredible difference in decoy hits number. For example, >>> >>> >> >> >> > with v4.0, >>> >>> >> >> >> > p>0.9, I would get my "regular" 1% decoy, while with v4.3, >>> >>> >> >> >> > p>0.9, I >>> >>> >> >> >> > get above 25% of decoys?!?? >>> >>> >> >> >> > All the interact are run with the following options: >>> >>> >> >> >> > xinteract -OApld - >>> >>> >> >> >> > ddecoy *.pep.xml >>> >>> >> >> >> > I could nail down the "problem" to the >>> >>> >> >> >> > PeptideProphetParser which >>> >>> >> >> >> > behaves very differently between v4.0 and v4.3, while >>> >>> >> >> >> > InteractParser >>> >>> >> >> >> > (which introduces the "is_rejected=1" tags) and >>> >>> >> >> >> > RefreshParser do not >>> >>> >> >> >> > influence the results. >>> >>> >> >> >> > But at the moment, I do not know if it is an issue of the >>> >>> >> >> >> > decoy >>> >>> >> >> >> > statistical distribution of prophet or not... >>> >>> >>> >>> >> >> >> > One more thing that makes me even more suspicious is the >>> >>> >> >> >> > fact that, >>> >>> >> >> >> > only with TPP version 4.3, if you search those files in a >>> >>> >> >> >> > difference >>> >>> >> >> >> > order (let say: xinteract file1 file2 file3 Vs xinteract >>> >>> >> >> >> > file3 file2 >>> >>> >> >> >> > file1), you do get differences in the results as well?!? >>> >>> >>> >>> >> >> >> > I am willing to send the 4 pepxml where those observations >>> >>> >> >> >> > are the >>> >>> >> >> >> > most critical to David or Luis or anybody interested, but >>> >>> >> >> >> > I truly >>> >>> >> >> >> > believe that there might be something going wrong with the >>> >>> >> >> >> > TPP v4.3. >>> >>> >>> >>> >> >> >> > Let me know to whom I should post the files. >>> >>> >>> >>> >> >> >> > Best regards, >>> >>> >>> >>> >> >> >> > Ludovic >>> >>> >>> >>> >> >> >> > -- >>> >>> >> >> >> > You received this message because you are subscribed to >>> >>> >> >> >> > the Google Groups "spctools-discuss" group. >>> >>> >> >> >> > To post to this group, send email to >>> >>> >> >> >> > [email protected]. >>> >>> >> >> >> > To unsubscribe from this group, send email to >>> >>> >> >> >> > [email protected]. >>> >>> >> >> >> > For more options, visit this group >>> >>> >> >> >> > athttp://groups.google.com/group/spctools-discuss?hl=en. >>> >>> >>> >>> >> >> >> -- >>> >>> >> >> >> You received this message because you are subscribed to the >>> >>> >> >> >> Google Groups "spctools-discuss" group. >>> >>> >> >> >> To post to this group, send email to >>> >>> >> >> >> [email protected]. >>> >>> >> >> >> To unsubscribe from this group, send email to >>> >>> >> >> >> [email protected]. >>> >>> >> >> >> For more options, visit this group >>> >>> >> >> >> athttp://groups.google.com/group/spctools-discuss?hl=en. >>> >>> >>> >>> >> >> > -- >>> >>> >> >> > You received this message because you are subscribed to the >>> >>> >> >> > Google Groups "spctools-discuss" group. >>> >>> >> >> > To post to this group, send email to >>> >>> >> >> > [email protected]. >>> >>> >> >> > To unsubscribe from this group, send email to >>> >>> >> >> > [email protected]. >>> >>> >> >> > For more options, visit this group >>> >>> >> >> > athttp://groups.google.com/group/spctools-discuss?hl=en. >>> >>> >>> >>> >> >> -- >>> >>> >> >> You received this message because you are subscribed to the >>> >>> >> >> Google Groups "spctools-discuss" group. >>> >>> >> >> To post to this group, send email to >>> >>> >> >> [email protected]. >>> >>> >> >> To unsubscribe from this group, send email to >>> >>> >> >> [email protected]. >>> >>> >> >> For more options, visit this group >>> >>> >> >> athttp://groups.google.com/group/spctools-discuss?hl=en. >>> >>> >>> >>> >> > -- >>> >>> >> > You received this message because you are subscribed to the >>> >>> >> > Google Groups "spctools-discuss" group. >>> >>> >> > To post to this group, send email to >>> >>> >> > [email protected]. >>> >>> >> > To unsubscribe from this group, send email to >>> >>> >> > [email protected]. >>> >>> >> > For more options, visit this group >>> >>> >> > athttp://groups.google.com/group/spctools-discuss?hl=en. >>> >>> >>> >>> > -- >>> >>> > You received this message because you are subscribed to the Google >>> >>> > Groups "spctools-discuss" group. >>> >>> > To post to this group, send email to >>> >>> > [email protected]. >>> >>> > To unsubscribe from this group, send email to >>> >>> > [email protected]. >>> >>> > For more >>> >>> >>> >>> ... >>> >>> >>> >>> read more » >>> >> >>> >> -- >>> >> You received this message because you are subscribed to the Google >>> >> Groups "spctools-discuss" group. >>> >> To post to this group, send email to >>> >> [email protected]. >>> >> To unsubscribe from this group, send email to >>> >> [email protected]. >>> >> For more options, visit this group at >>> >> http://groups.google.com/group/spctools-discuss?hl=en. >>> >> >>> >> >>> > >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "spctools-discuss" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/spctools-discuss?hl=en. >>> >> > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
