hey david, many thanks for looking into it! i was wondering if you could point me to a source where conditions such as not using a '.' character in the file's basename are described because i was not aware of that.
many thanks in advance, andreas On Wed, Jun 2, 2010 at 12:17 AM, David Shteynberg < [email protected]> wrote: > Hello Ludovic, > > I think the problem might be in the pepXML generated in your pipeline. > The offending entry is in the file 20100422_04_control_07.c.pep.xml : > > > <spectrum_query spectrum="20100422_04_control_07.c.10165.10165.10" > start_scan="10165" end_scan="10165" precursor_neutral_mass="3602.2759" > assumed_charge="1" index="307" retention_time_sec="5514.3"> > > > Looks like the assumed charge opf 1+ doesn't match the encoded charge > in the spectrum name (the number after the last dot "10" in this > case). > > If you simply delete this entry from the file the fvalues and > probabilities should line up again. I will change the code to error > out when this scenario is encountered in the future. > > > To address the underlying issue it looks like the conversion to pepXML > failed on this entry for some reason. How were the pepXML files > generated? Can you send me the original .out files so I can test the > Out2XML sequest pepXML converter tool. This could also be caused by > the '.' character in the basename of your files. Only alphanumeric > and underscore characters are allowed there traditionally. > > Thanks, > -David > > On Mon, May 31, 2010 at 3:37 PM, David Shteynberg > <[email protected]> wrote: > > Hello Ludovic, > > > > Yes, I see that there is a difference in the output files. I think > > the problem is that the output of probabilities and fvals is > > misaligned from the spectra in the pep.xml file. You'll see that the > > next spectrum has the correct value. This definitely points to a bug > > in the 4.3 version that is quite dated now. We are close to releasing > > a new version. Are you able to test the latest SVN trunk version of > > the software on your system to check if the bug is still present > > there? > > > > Thanks, > > -David > > > > On Mon, May 31, 2010 at 1:53 AM, lgillet <[email protected]> > wrote: > >> Hi David, > >> > >> No, I confirm again that I do find a difference upon running the > >> xinteract command with the files in different orders (I confirm I also > >> see those differences on TPP 4.3.1 installed on Unix and on > >> WindowsXP). > >> I have re-run the command on the same folder, on the same files, to > >> avoid any confusion about file names or so. The result of the first > >> command was named NormalOrder, and the second ScrambledOrder. > >> I have made another zip file (xinteract-output.zip) with the output of > >> my commands (such that you can check if I do anything wrong by running > >> the command). Also at the end of the text file, I have copy-pasted > >> some line from the summary visualisation from the web interface, with > >> different filter criteria. > >> I also attached in the zip the output of the 2 interact.pep.xml. > >> If you look at a diff between both files, you will find plenty of > >> differences on some spectra. For example, you can have a look at: > >> > >> 20100422_01_control_04.c.07416.07416.3 => fval = 4.88 Vs. 0.058 (in > >> NormalOrder Vs. ScrambledOrder resp.) > >> 20100422_01_control_05.c.08545.08545.3 => fval = 5.36 Vs. completely > >> absent (in NormalOrder Vs. ScrambledOrder resp.) > >> > >> Finally, could you please run those 4 pep.xml files on your server on > >> TPP 4.0 and TPP 4.3 by yourself? > >> You may realize by yourself then that there are not only "some > >> differeces" but the differences (especially if you look at the decoy > >> protein hits) is terrible. > >> > >> Thanks and let me know if something is still not clear. > >> > >> Ludovic > >> > >> On May 27, 7:33 pm, David Shteynberg <[email protected]> > >> wrote: > >>> Hi Ludovic, > >>> > >>> It is completely normal to expect some difference in the results > >>> between version of the software since the models maybe slightly > >>> different in new a version due to optimization, bug fixes and the > >>> sort. Hopefully the new analysis is able to increase your correct > >>> identifications at a set error rate. > >>> > >>> When I run your data through our 4.3.1 pipeline I get your result in > >>> the scrambled analysis (regardless of the order in which I specify my > >>> input pepxml files). The difference in *your* two analyses is due to > >>> the difference in your input files. Here is the relevant info from > >>> your two 4.3.1 analyses: > >>> > >>> interact-TPP-V4.3.pep.xml has 8931 spectra in charge 2+ that it models: > >>> > >>> <mixture_model precursor_ion_charge="2" comments="using no. tolerable > >>> trypsin term. [ntt] 0 data as pseudonegatives" > >>> prior_probability="0.427" est_tot_correct="3830.1" > >>> tot_num_spectra="8931" num_iterations="28"> > >>> > >>> interact_TPP-V4.3_scrambled.pep.xml has 8929 spectra in charge 2+ that > >>> it models: > >>> > >>> <mixture_model precursor_ion_charge="2" comments="using no. tolerable > >>> trypsin term. [ntt] 0 data as pseudonegatives" > >>> prior_probability="0.427" est_tot_correct="3829.1" > >>> tot_num_spectra="8929" num_iterations="28"> > >>> > >>> Since the inputs are different in the two analyses the results will be > >>> different. Please verify that the inputs your are giving to the two > >>> analyses in different order are *not identical*. Can you verify this? > >>> > >>> Thanks, > >>> -David > >>> > >>> On Thu, May 27, 2010 at 8:12 AM, lgillet <[email protected]> > wrote: > >>> > Hi David, > >>> > >>> > TPP is installed in different servers in our Institute. I have re- > >>> > uploaded a new file (lgillet_interact-again.zip) for which the TPP > >>> > xinteract was performed on the same server and with different > versions > >>> > of the TPP. You can see that the results are still very different, > >>> > even the scrambled case. > >>> > Note that I used the version TPP v4.3 JETSTREAM rev 1, Build > >>> > 201004201202 (linux); which I do not know if it is the same as the > SVN > >>> > TPP that you mentioned or a "nightly-built". > >>> > - Can you re-confirm my results using your installation of TPP with > my > >>> > 4 pep.xml files? > >>> > - Can you re-confirm the differences in decoy % using your > >>> > installation of TPP between TPP V4.0 and V4.3 with my 4 pep.xml > files? > >>> > Thanks again, > >>> > Ludovic > >>> > >>> > On May 26, 9:01 pm, David Shteynberg <[email protected] > > > >>> > wrote: > >>> >> Hi Ludovic, > >>> > >>> >> I was unable to duplicate the different results on different order > of > >>> >> input using the latest version of SVN tpp or version 4.3.1. I > noticed > >>> >> that your two analyses point to different locations. Are you sure > >>> >> that the files at these locations are identical? > >>> > >>> >> Thanks, > >>> >> -David > >>> > >>> >> On Wed, May 26, 2010 at 10:47 AM, lgillet <[email protected]> > wrote: > >>> >> > Hi David, > >>> > >>> >> > all my apologizes, the rar file got corrupted probably during the > >>> >> > upload (the original on my HD was fine). > >>> >> > I have uploaded again a zip file this time: > lgillet_pepxml-again2.zip > >>> >> > I hope that works this time (after download, I can decompress it > >>> >> > back). > >>> >> > Thanks for having a look at this issue. > >>> >> > Best, > >>> >> > Ludovic > >>> > >>> >> > On May 25, 7:24 pm, David Shteynberg < > [email protected]> > >>> >> > wrote: > >>> >> >> Hi Ludovic, > >>> > >>> >> >> It seems the file you uploaded lgillet_pepxml_for_TPP4.3.rar is > >>> >> >> corrupted. At least I am unable to open it. Please upload again. > >>> > >>> >> >> Thanks, > >>> >> >> -David > >>> > >>> >> >> On Wed, May 19, 2010 at 2:54 AM, lgillet < > [email protected]> wrote: > >>> >> >> > Hi David, Hi Natalie, > >>> > >>> >> >> > I just posted the 4 pepxml files which give me the most > striking > >>> >> >> > differences in results between TPP-V4.0 and TPP-V4.3: > >>> >> >> > lgillet_pepxml_for_TPP4.3.rar. I also posted the results > >>> >> >> > (interact.pep.xml) which I obtain from running TPP-V4.0, > TPP-V4.3 and > >>> >> >> > TPP-V4.3 on scrambled file order (file #4>#3>#2>#1): > lgillet_interact- > >>> >> >> > results.rar. > >>> >> >> > I really tried my best to figure out what the problem could be. > >>> >> >> > Maybe you could re-run the same analyses (TPP-V4.0, TPP-V4.3, > TPP-V4.3 > >>> >> >> > with the scrambled file order) and let me know if you confirm > my > >>> >> >> > results or if there is something wrong maybe with the compiled > version > >>> >> >> > we have on our server (could still be a possibility). > >>> >> >> > Finally, to answer Natalie's question, the differences are > quite > >>> >> >> > dramatic (to my opinion) between V4.0 and V4.3 (I would not > have > >>> >> >> > worried about 1-2% differences in IDs), but here, I am passing > from 1% > >>> >> >> > decoy (V4.0) to 23% decoy (V4.3) hits (at the same proba > > 0.9). Also > >>> >> >> > the number of unique peptides reported by V4.0 and V4.3 is > quite > >>> >> >> > different (2150 and 3161 resp.). Finally, many decoy hits > pulled up in > >>> >> >> > V4.3 with a prob>0.9 have actually a very bad MS/MS spectrum > and a > >>> >> >> > very low prob<0.01 (only reported if you use -p0 option) on > V4.0. > >>> > >>> >> >> > Have a look at those MS/MS spectra for example: > >>> > >>> >> >> > 20100422_04_control_07.c.07700.07700.4 > >>> >> >> > 20100422_04_control_07.c.02864.02864.3 > >>> > >>> >> >> > Let me know if you need any extra information. > >>> > >>> >> >> > Thanks a lot for your help on that. > >>> > >>> >> >> > Best, > >>> > >>> >> >> > Ludovic > >>> > >>> >> >> > On May 18, 11:21 pm, Natalie Tasman < > [email protected]> > >>> >> >> > wrote: > >>> >> >> >> Ludovic, > >>> > >>> >> >> >> Go ahead and post the files to the newsgroup's file area > >>> >> >> >> (http://groups.google.com/group/spctools-discuss/files), and > hopefully > >>> >> >> >> one of the validation experts will take a look. > >>> > >>> >> >> >> I will point out that PeptideProphet uses random > initialization for > >>> >> >> >> it's curve fitting (EM algorithm). So it's not out of the > question > >>> >> >> >> that you'd see some small differences between runs on the same > data > >>> >> >> >> files, regardless of the order. Can you provide some measure > of the > >>> >> >> >> differences between runs for the reordered datasets? > >>> > >>> >> >> >> -Natalie > >>> > >>> >> >> >> On Tue, May 18, 2010 at 4:35 AM, lgillet < > [email protected]> wrote: > >>> >> >> >> > Hi everybody, > >>> >> >> >> > I recently encountered a "bug" I think when people in my lab > installed > >>> >> >> >> > the newest TPP (v4.3 JETSTREAM rev 1, Build 201004201202 > (linux)), > >>> >> >> >> > especially when I try to confront the result to v4.0 which > was our > >>> >> >> >> > former "benchmark" version. > >>> >> >> >> > When searching the same 4 pep.xml files with v4.0 and v4.3, > I get an > >>> >> >> >> > incredible difference in decoy hits number. For example, > with v4.0, > >>> >> >> >> > p>0.9, I would get my "regular" 1% decoy, while with v4.3, > p>0.9, I > >>> >> >> >> > get above 25% of decoys?!?? > >>> >> >> >> > All the interact are run with the following options: > xinteract -OApld - > >>> >> >> >> > ddecoy *.pep.xml > >>> >> >> >> > I could nail down the "problem" to the PeptideProphetParser > which > >>> >> >> >> > behaves very differently between v4.0 and v4.3, while > InteractParser > >>> >> >> >> > (which introduces the "is_rejected=1" tags) and > RefreshParser do not > >>> >> >> >> > influence the results. > >>> >> >> >> > But at the moment, I do not know if it is an issue of the > decoy > >>> >> >> >> > statistical distribution of prophet or not... > >>> > >>> >> >> >> > One more thing that makes me even more suspicious is the > fact that, > >>> >> >> >> > only with TPP version 4.3, if you search those files in a > difference > >>> >> >> >> > order (let say: xinteract file1 file2 file3 Vs xinteract > file3 file2 > >>> >> >> >> > file1), you do get differences in the results as well?!? > >>> > >>> >> >> >> > I am willing to send the 4 pepxml where those observations > are the > >>> >> >> >> > most critical to David or Luis or anybody interested, but I > truly > >>> >> >> >> > believe that there might be something going wrong with the > TPP v4.3. > >>> > >>> >> >> >> > Let me know to whom I should post the files. > >>> > >>> >> >> >> > Best regards, > >>> > >>> >> >> >> > Ludovic > >>> > >>> >> >> >> > -- > >>> >> >> >> > You received this message because you are subscribed to the > Google Groups "spctools-discuss" group. > >>> >> >> >> > To post to this group, send email to > [email protected]. > >>> >> >> >> > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >>> >> >> >> > For more options, visit this group athttp:// > groups.google.com/group/spctools-discuss?hl=en. > >>> > >>> >> >> >> -- > >>> >> >> >> You received this message because you are subscribed to the > Google Groups "spctools-discuss" group. > >>> >> >> >> To post to this group, send email to > [email protected]. > >>> >> >> >> To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >>> >> >> >> For more options, visit this group athttp:// > groups.google.com/group/spctools-discuss?hl=en. > >>> > >>> >> >> > -- > >>> >> >> > You received this message because you are subscribed to the > Google Groups "spctools-discuss" group. > >>> >> >> > To post to this group, send email to > [email protected]. > >>> >> >> > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >>> >> >> > For more options, visit this group athttp:// > groups.google.com/group/spctools-discuss?hl=en. > >>> > >>> >> >> -- > >>> >> >> You received this message because you are subscribed to the > Google Groups "spctools-discuss" group. > >>> >> >> To post to this group, send email to > [email protected]. > >>> >> >> To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >>> >> >> For more options, visit this group athttp:// > groups.google.com/group/spctools-discuss?hl=en. > >>> > >>> >> > -- > >>> >> > You received this message because you are subscribed to the Google > Groups "spctools-discuss" group. > >>> >> > To post to this group, send email to > [email protected]. > >>> >> > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >>> >> > For more options, visit this group athttp:// > groups.google.com/group/spctools-discuss?hl=en. > >>> > >>> > -- > >>> > You received this message because you are subscribed to the Google > Groups "spctools-discuss" group. > >>> > To post to this group, send email to > [email protected]. > >>> > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >>> > For more > >>> > >>> ... > >>> > >>> read more ยป > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "spctools-discuss" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > >> For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > >> > >> > > > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<spctools-discuss%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
