[spctools-discuss] Re: msconvert conversion of .wiff file to mzXML presents scans out of order...

Paul Bergen Thu, 27 Jan 2011 08:08:02 -0800

Sorry to interupt the flow of ideas, but for the newcomer to these
file conversion issues:  Is there a tool that I could use to validate
a conversion from a .wiff file to mzXML?


thanks

Paul

On Jan 26, 7:43 pm, Nathan Edwards <[email protected]> wrote:
> Whoa. I am looking at the results of three different .wiff file
> processing tools and a acquisition sample, and they all appear to
> exhibit the same behavior with respect to repeated retention times and
> duplicated MS1 spectra (suggests the problem is in Analyst or the
> Analyst API) :-(
>
> Here is what I've observed. If the scans are sorted by retention time,
> I observe that sometimes the last MS/MS experiment of a cycle has the
> same retention time as the MS experiment of the next cycle. The MS/MS
> spectrum has data in it. The MS spectrum data appears to be a repeat
> of the previous cycle's MS experiment. Even in mzWiff, which gives
> each experiment of a cycle the same retention time, seems to have the
> repeated MS scan if looked up directly.
>
> Analyst (2.0 here) doesn't make it easy to figure out what the right
> answer is, but it helped me form a hypothesis.
>
> Some notation - let sij be spectrum in cycle i, experiment j, and
> presume cycles consist of an MS spectrum (exp 1) and an MS/MS spectrum
> (exp 2).
>
> So in order, s11 -> s12 -> s21 -> s22, and I've observed that the
> retention time of rt(s12) == rt(s21), and that spectrum(s11) ==
> spectrum(s21). Also, precursorMz(s12) == precursorMz(s22).
>
> In Analyst, it appears that spectrum(s22) is displayed when looking at
> the s11 and s12 pair (with their retention times). retention times
> corresponding to s21 and s22 are not shown in the IDA explorer view.
>
> What I think is happening is that MS scan s11 is taken,
> precursorMz(s12) is selected and the acquisition of s12 is started.
> However, time runs out (?) before enough signal is collected. s12 is
> filled in with the current data when time runs out, and spectral
> acquisition is continued. s22 represents the "2x acquisition time" of
> s12 and holds the accumulation of two scan's worth of data. s21 is
> filled in with s11's data and the spectral data in s22 is presented
> with s12's meta-data.
>
> Now we'd need LifeTech/ABI/MDS/Sciex to confirm or deny, but if all of
> this is correct, the easiest fix would be to drop s12 and s21, but it
> is unclear how all of this generalizes with more MS/MS experiments per
> cycle with perhaps only experiment 3 requiring more time. Sigh.
>
> - n
>
> On Jan 25, 10:31 am, Nathan Edwards <[email protected]> wrote:
>
>
>
> > I'm getting back to analyzing this issue. Note that mzWiff outputs in
> > cycle major order (all experiments of each cycle in order), as opposed
> > to msconvert and the ABI tool. Furthermore, mzWiff's conversion time
> > for my test case wiff/wiff.scan file (15 sample file of approximately
> > 27K spectra, about 120Mb .wiff/wiff.scan file, no peak detection) is
> > about the same as for msconvert (about 10mins each) despite the cycle
> > major order of output.
>
> > These results suggest the API isn't always slower for cycle major
> > order. Perhaps the API can do random file seeks if there is
> > a .wiff.scan file, and not if there isn't? Dunno. However, this does
> > suggest there are cases where cycle major is no slower than experiment
> > major order...
>
> > I'll be looking into the repeat retention time issue and the apparent
> > duplicated spectrum next.
>
> > - n
>
> > On Jan 19, 2:25 pm, Nathan Edwards <[email protected]> wrote:
>
> > > Ugh. I was worried it was due to efficiency issues with the vendor
> > > API.
>
> > > Sigh. Regardless of whether the scan numbers are real or made-up, I
> > > think that the non-chronological order of the scans in the file is an
> > > issue. I suspect others will be surprised by this too.
>
> > > At the time of conversion it is possible to read in one way and write
> > > in another without having to resort globally (read from # experiments
> > > "caches" in turn) but without an experiment annotation in the spectra
> > > metadata, a global retentionTime sort is the only robust alternative I
> > > can think of (though linear time merge sort for # experiments
> > > monotonicly increasing runs is doable, I guess). There are
> > > retentionTime repeats (empty spectra before the real spectrum with the
> > > correct retention time). More about this next.
>
> > > How can I detect that the retention time is not monotonic without
> > > reading a large chunk of the file? I guess I can look for a magic
> > > string in the first 1K of the file (.wiff, Analyst) to decide whether
> > > to do this expensive check, and fix.
>
> > > Without explicit information in the .wiff file data structure,
> > > formally determining the precursor scan may not be possible, but the
> > > "cycle,experiment" grouping (as opposed to experiment,cycle) will
> > > capture the right relationships by chronology for the vast majority of
> > > LC-MS/MS datasets.
>
> > > - n
>
> > > On Jan 19, 12:49 pm, Matthew Chambers <[email protected]>
> > > wrote:
>
> > > > I am well aware of this issue, but there's no schematic rule about the 
> > > > file being in retention time
> > > > order. And there is no scan number for a WIFF scan (since it uses the 
> > > > arbitrary index that pwiz
> > > > translates to, that part at least actually does increase 
> > > > monotonically). Use mzML and nativeID! :)
>
> > > > The problem is the WiffFileReader API takes a relative eternity to 
> > > > switch between experiments. It's
> > > > quite slow enough as it is. :) You'll be happy to hear that the new API 
> > > > does not have the same
> > > > problem. With the current API it would be faster (except possibly with 
> > > > huge profile data) to first
> > > > convert to XML and then use a sorting filter to convert the XML to 
> > > > another file sorted by retention
> > > > time. Currently there is a sorting filter, but no built-in predicates 
> > > > that use it are accessible
> > > > from the command-line.
>
> > > > I'm not actually sure HOW to tell which scan is the precursor scan. In 
> > > > Thermo, figuring out the
> > > > precursor scan with certainty without parsing the scan event list 
> > > > (which comes in a fascinating
> > > > variety of formats) can be quite tricky. I don't know if the same 
> > > > problems exist in ABI and there's
> > > > no scan event list to check (AFAIK), so I punted.
>
> > > > -Matt
>
> > > > On 1/19/2011 11:34 AM, Nathan Edwards wrote:
>
> > > > > I've had this problem with a variety of tools and their handling .wiff
> > > > > data file from Analyst, and now having gotten msconvert to work
> > > > > (thanks Matt) I was hoping that msconvert did it "right".
>
> > > > > Unfortunately, it doesn't seem so.
>
> > > > > I believe that the scan number and retention times should increase
> > > > > monotonically in the mzXML file and in a tandem mass-spectrometry
> > > > > experiment, I expect the MS1 scan to be immediately followed by the 
> > > > > MS/
> > > > > MS scans whose precursors are derived from the MS1 scan.
>
> > > > > A number (n>= 2) of converters (msconvert&  ABI's) for .wiff files do
> > > > > not respect this file structure and output the spectra by experiment
> > > > > and cycle, with all experiment 1 (MS1 spectra) first, then all
> > > > > experiment 2 (MS/MS from first selected precursor peak from each MS1
> > > > > spectrum), then all experiment 3, etc.
>
> > > > > In the msconvert mzXML output, there isn't even any reference in the
> > > > > MS2 spectra to assist in determining the correct MS1 spectrum to
> > > > > associate with the MS2 spectrum.
>
> > > > > It is possible to use various tricks to try and determine cycle,
> > > > > experiment, and MS1/MS2 relationships but at the least these require
> > > > > sorting (globally) on retentionTime, an expensive proposition for
> > > > > large mzXML files.
>
> > > > > I'd be happy to provide an example mzXML output to demonstrate the
> > > > > issue.
>
> > > > > - n- Hide quoted text -
>
> - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

[spctools-discuss] Re: msconvert conversion of .wiff file to mzXML presents scans out of order...

Reply via email to