OK, sorry, interupted. Here is an example of the problem with retentionTime sorting (and a msconvert/pwiz bug!):
msLevel retentionTime precursorMz peaksCount basePeakMz startMz endMz num 1 1510.83 - 1263 371.082032334 225.102565257 1197.95325018 1031 1 1511.94 - 1257 473.596738128 223.037754119 1197.62626587 1032 1 1513.03 - 1281 473.596738128 223.03211017 1198.92138532 1033 2 1513.85 570.260399924 93 729.337882586 155.163041064 1393.62827746 1607 1 1513.85 - 1281 473.596738128 223.03211017 1198.92138532 1034 2 1514.6 570.260399924 121 570.260399924 175.104021303 1201.47442517 1608 1 1515.62 - 1273 371.082032334 223.037754119 1199.18311059 1035 1 1516.71 - 1232 371.089312373 217.869931256 1197.83553069 1036 1 1517.79 - 1159 371.082032334 221.094829342 1193.8887412 1037 1 1518.91 - 1181 371.082032334 223.043398139 1199.03915816 1038 I'm sure the table will be foobar'ed by a proportional font. Sigh. Retention time 1513.85 is mentioned twice, I don't know how to interpret this. I think MS1 scan num 1034 is empty in the wiff file (gets the retention time of next spectrum as it takes no time to collect), and its spectrum is a carry over from scan 1033 (notice that the spectrum mz metadata is identical for basePeakMz, startMz, and endMz, very unlikely). Base 64 spectral data for 1033 and 1034 are identical too. That said, why are we taking a MS2 spectrum (1608) if we don't take a MS1 spectrum prior to it? - n On Jan 19, 2:25 pm, Nathan Edwards <[email protected]> wrote: > Ugh. I was worried it was due to efficiency issues with the vendor > API. > > Sigh. Regardless of whether the scan numbers are real or made-up, I > think that the non-chronological order of the scans in the file is an > issue. I suspect others will be surprised by this too. > > At the time of conversion it is possible to read in one way and write > in another without having to resort globally (read from # experiments > "caches" in turn) but without an experiment annotation in the spectra > metadata, a global retentionTime sort is the only robust alternative I > can think of (though linear time merge sort for # experiments > monotonicly increasing runs is doable, I guess). There are > retentionTime repeats (empty spectra before the real spectrum with the > correct retention time). More about this next. > > How can I detect that the retention time is not monotonic without > reading a large chunk of the file? I guess I can look for a magic > string in the first 1K of the file (.wiff, Analyst) to decide whether > to do this expensive check, and fix. > > Without explicit information in the .wiff file data structure, > formally determining the precursor scan may not be possible, but the > "cycle,experiment" grouping (as opposed to experiment,cycle) will > capture the right relationships by chronology for the vast majority of > LC-MS/MS datasets. > > - n > > On Jan 19, 12:49 pm, Matthew Chambers <[email protected]> > wrote: > > > I am well aware of this issue, but there's no schematic rule about the file > > being in retention time > > order. And there is no scan number for a WIFF scan (since it uses the > > arbitrary index that pwiz > > translates to, that part at least actually does increase monotonically). > > Use mzML and nativeID! :) > > > The problem is the WiffFileReader API takes a relative eternity to switch > > between experiments. It's > > quite slow enough as it is. :) You'll be happy to hear that the new API > > does not have the same > > problem. With the current API it would be faster (except possibly with huge > > profile data) to first > > convert to XML and then use a sorting filter to convert the XML to another > > file sorted by retention > > time. Currently there is a sorting filter, but no built-in predicates that > > use it are accessible > > from the command-line. > > > I'm not actually sure HOW to tell which scan is the precursor scan. In > > Thermo, figuring out the > > precursor scan with certainty without parsing the scan event list (which > > comes in a fascinating > > variety of formats) can be quite tricky. I don't know if the same problems > > exist in ABI and there's > > no scan event list to check (AFAIK), so I punted. > > > -Matt > > > On 1/19/2011 11:34 AM, Nathan Edwards wrote: > > > > I've had this problem with a variety of tools and their handling .wiff > > > data file from Analyst, and now having gotten msconvert to work > > > (thanks Matt) I was hoping that msconvert did it "right". > > > > Unfortunately, it doesn't seem so. > > > > I believe that the scan number and retention times should increase > > > monotonically in the mzXML file and in a tandem mass-spectrometry > > > experiment, I expect the MS1 scan to be immediately followed by the MS/ > > > MS scans whose precursors are derived from the MS1 scan. > > > > A number (n>= 2) of converters (msconvert& ABI's) for .wiff files do > > > not respect this file structure and output the spectra by experiment > > > and cycle, with all experiment 1 (MS1 spectra) first, then all > > > experiment 2 (MS/MS from first selected precursor peak from each MS1 > > > spectrum), then all experiment 3, etc. > > > > In the msconvert mzXML output, there isn't even any reference in the > > > MS2 spectra to assist in determining the correct MS1 spectrum to > > > associate with the MS2 spectrum. > > > > It is possible to use various tricks to try and determine cycle, > > > experiment, and MS1/MS2 relationships but at the least these require > > > sorting (globally) on retentionTime, an expensive proposition for > > > large mzXML files. > > > > I'd be happy to provide an example mzXML output to demonstrate the > > > issue. > > > > - n -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
