Sorry to interupt the flow of ideas, but for the newcomer to these file conversion issues: Is there a tool that I could use to validate a conversion from a .wiff file to mzXML?
thanks Paul On Jan 26, 7:43 pm, Nathan Edwards <[email protected]> wrote: > Whoa. I am looking at the results of three different .wiff file > processing tools and a acquisition sample, and they all appear to > exhibit the same behavior with respect to repeated retention times and > duplicated MS1 spectra (suggests the problem is in Analyst or the > Analyst API) :-( > > Here is what I've observed. If the scans are sorted by retention time, > I observe that sometimes the last MS/MS experiment of a cycle has the > same retention time as the MS experiment of the next cycle. The MS/MS > spectrum has data in it. The MS spectrum data appears to be a repeat > of the previous cycle's MS experiment. Even in mzWiff, which gives > each experiment of a cycle the same retention time, seems to have the > repeated MS scan if looked up directly. > > Analyst (2.0 here) doesn't make it easy to figure out what the right > answer is, but it helped me form a hypothesis. > > Some notation - let sij be spectrum in cycle i, experiment j, and > presume cycles consist of an MS spectrum (exp 1) and an MS/MS spectrum > (exp 2). > > So in order, s11 -> s12 -> s21 -> s22, and I've observed that the > retention time of rt(s12) == rt(s21), and that spectrum(s11) == > spectrum(s21). Also, precursorMz(s12) == precursorMz(s22). > > In Analyst, it appears that spectrum(s22) is displayed when looking at > the s11 and s12 pair (with their retention times). retention times > corresponding to s21 and s22 are not shown in the IDA explorer view. > > What I think is happening is that MS scan s11 is taken, > precursorMz(s12) is selected and the acquisition of s12 is started. > However, time runs out (?) before enough signal is collected. s12 is > filled in with the current data when time runs out, and spectral > acquisition is continued. s22 represents the "2x acquisition time" of > s12 and holds the accumulation of two scan's worth of data. s21 is > filled in with s11's data and the spectral data in s22 is presented > with s12's meta-data. > > Now we'd need LifeTech/ABI/MDS/Sciex to confirm or deny, but if all of > this is correct, the easiest fix would be to drop s12 and s21, but it > is unclear how all of this generalizes with more MS/MS experiments per > cycle with perhaps only experiment 3 requiring more time. Sigh. > > - n > > On Jan 25, 10:31 am, Nathan Edwards <[email protected]> wrote: > > > > > I'm getting back to analyzing this issue. Note that mzWiff outputs in > > cycle major order (all experiments of each cycle in order), as opposed > > to msconvert and the ABI tool. Furthermore, mzWiff's conversion time > > for my test case wiff/wiff.scan file (15 sample file of approximately > > 27K spectra, about 120Mb .wiff/wiff.scan file, no peak detection) is > > about the same as for msconvert (about 10mins each) despite the cycle > > major order of output. > > > These results suggest the API isn't always slower for cycle major > > order. Perhaps the API can do random file seeks if there is > > a .wiff.scan file, and not if there isn't? Dunno. However, this does > > suggest there are cases where cycle major is no slower than experiment > > major order... > > > I'll be looking into the repeat retention time issue and the apparent > > duplicated spectrum next. > > > - n > > > On Jan 19, 2:25 pm, Nathan Edwards <[email protected]> wrote: > > > > Ugh. I was worried it was due to efficiency issues with the vendor > > > API. > > > > Sigh. Regardless of whether the scan numbers are real or made-up, I > > > think that the non-chronological order of the scans in the file is an > > > issue. I suspect others will be surprised by this too. > > > > At the time of conversion it is possible to read in one way and write > > > in another without having to resort globally (read from # experiments > > > "caches" in turn) but without an experiment annotation in the spectra > > > metadata, a global retentionTime sort is the only robust alternative I > > > can think of (though linear time merge sort for # experiments > > > monotonicly increasing runs is doable, I guess). There are > > > retentionTime repeats (empty spectra before the real spectrum with the > > > correct retention time). More about this next. > > > > How can I detect that the retention time is not monotonic without > > > reading a large chunk of the file? I guess I can look for a magic > > > string in the first 1K of the file (.wiff, Analyst) to decide whether > > > to do this expensive check, and fix. > > > > Without explicit information in the .wiff file data structure, > > > formally determining the precursor scan may not be possible, but the > > > "cycle,experiment" grouping (as opposed to experiment,cycle) will > > > capture the right relationships by chronology for the vast majority of > > > LC-MS/MS datasets. > > > > - n > > > > On Jan 19, 12:49 pm, Matthew Chambers <[email protected]> > > > wrote: > > > > > I am well aware of this issue, but there's no schematic rule about the > > > > file being in retention time > > > > order. And there is no scan number for a WIFF scan (since it uses the > > > > arbitrary index that pwiz > > > > translates to, that part at least actually does increase > > > > monotonically). Use mzML and nativeID! :) > > > > > The problem is the WiffFileReader API takes a relative eternity to > > > > switch between experiments. It's > > > > quite slow enough as it is. :) You'll be happy to hear that the new API > > > > does not have the same > > > > problem. With the current API it would be faster (except possibly with > > > > huge profile data) to first > > > > convert to XML and then use a sorting filter to convert the XML to > > > > another file sorted by retention > > > > time. Currently there is a sorting filter, but no built-in predicates > > > > that use it are accessible > > > > from the command-line. > > > > > I'm not actually sure HOW to tell which scan is the precursor scan. In > > > > Thermo, figuring out the > > > > precursor scan with certainty without parsing the scan event list > > > > (which comes in a fascinating > > > > variety of formats) can be quite tricky. I don't know if the same > > > > problems exist in ABI and there's > > > > no scan event list to check (AFAIK), so I punted. > > > > > -Matt > > > > > On 1/19/2011 11:34 AM, Nathan Edwards wrote: > > > > > > I've had this problem with a variety of tools and their handling .wiff > > > > > data file from Analyst, and now having gotten msconvert to work > > > > > (thanks Matt) I was hoping that msconvert did it "right". > > > > > > Unfortunately, it doesn't seem so. > > > > > > I believe that the scan number and retention times should increase > > > > > monotonically in the mzXML file and in a tandem mass-spectrometry > > > > > experiment, I expect the MS1 scan to be immediately followed by the > > > > > MS/ > > > > > MS scans whose precursors are derived from the MS1 scan. > > > > > > A number (n>= 2) of converters (msconvert& ABI's) for .wiff files do > > > > > not respect this file structure and output the spectra by experiment > > > > > and cycle, with all experiment 1 (MS1 spectra) first, then all > > > > > experiment 2 (MS/MS from first selected precursor peak from each MS1 > > > > > spectrum), then all experiment 3, etc. > > > > > > In the msconvert mzXML output, there isn't even any reference in the > > > > > MS2 spectra to assist in determining the correct MS1 spectrum to > > > > > associate with the MS2 spectrum. > > > > > > It is possible to use various tricks to try and determine cycle, > > > > > experiment, and MS1/MS2 relationships but at the least these require > > > > > sorting (globally) on retentionTime, an expensive proposition for > > > > > large mzXML files. > > > > > > I'd be happy to provide an example mzXML output to demonstrate the > > > > > issue. > > > > > > - n- Hide quoted text - > > - Show quoted text - -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
