[spctools-discuss] Re: msconvert conversion of .wiff file to mzXML presents scans out of order...

Nathan Edwards Wed, 19 Jan 2011 13:15:06 -0800

 What do you mean validation?

I have a copy of Analyst installed. I'm confident it doesn't have two
scans at the same retention time.


- n

On Jan 19, 3:01 pm, Matthew Chambers <[email protected]>
wrote:
> The validation would have to come from Analyst I suspect. Do you have an 
> installation you can look
> at these cycles with? If there's a discrepancy between Analyst and 
> WiffFileDataReader and there's a
> feasible way to work hack around it, that could be done.
>
> -Matt
>
> On 1/19/2011 1:49 PM, Nathan Edwards wrote:
>
>
>
> > Retention time 1513.85 is mentioned twice, I don't know how to
> > interpret this. I think MS1 scan num 1034 is empty in the wiff file
> > (gets the retention time of next spectrum as it takes no time to
> > collect), and its spectrum is a carry over from scan 1033 (notice that
> > the spectrum mz metadata is identical for basePeakMz, startMz, and
> > endMz, very unlikely). Base 64 spectral data for 1033 and 1034 are
> > identical too. That said, why are we taking a MS2 spectrum (1608) if
> > we don't take a MS1 spectrum prior to it?
>
> > - n
>
> > On Jan 19, 2:25 pm, Nathan Edwards<[email protected]>  wrote:
> >> Ugh. I was worried it was due to efficiency issues with the vendor
> >> API.
>
> >> Sigh. Regardless of whether the scan numbers are real or made-up, I
> >> think that the non-chronological order of the scans in the file is an
> >> issue. I suspect others will be surprised by this too.
>
> >> At the time of conversion it is possible to read in one way and write
> >> in another without having to resort globally (read from # experiments
> >> "caches" in turn) but without an experiment annotation in the spectra
> >> metadata, a global retentionTime sort is the only robust alternative I
> >> can think of (though linear time merge sort for # experiments
> >> monotonicly increasing runs is doable, I guess). There are
> >> retentionTime repeats (empty spectra before the real spectrum with the
> >> correct retention time). More about this next.
>
> >> How can I detect that the retention time is not monotonic without
> >> reading a large chunk of the file? I guess I can look for a magic
> >> string in the first 1K of the file (.wiff, Analyst) to decide whether
> >> to do this expensive check, and fix.
>
> >> Without explicit information in the .wiff file data structure,
> >> formally determining the precursor scan may not be possible, but the
> >> "cycle,experiment" grouping (as opposed to experiment,cycle) will
> >> capture the right relationships by chronology for the vast majority of
> >> LC-MS/MS datasets.
>
> >> - n
>
> >> On Jan 19, 12:49 pm, Matthew Chambers<[email protected]>
> >> wrote:
>
> >>> I am well aware of this issue, but there's no schematic rule about the 
> >>> file being in retention time
> >>> order. And there is no scan number for a WIFF scan (since it uses the 
> >>> arbitrary index that pwiz
> >>> translates to, that part at least actually does increase monotonically). 
> >>> Use mzML and nativeID! :)
>
> >>> The problem is the WiffFileReader API takes a relative eternity to switch 
> >>> between experiments. It's
> >>> quite slow enough as it is. :) You'll be happy to hear that the new API 
> >>> does not have the same
> >>> problem. With the current API it would be faster (except possibly with 
> >>> huge profile data) to first
> >>> convert to XML and then use a sorting filter to convert the XML to 
> >>> another file sorted by retention
> >>> time. Currently there is a sorting filter, but no built-in predicates 
> >>> that use it are accessible
> >>> from the command-line.
>
> >>> I'm not actually sure HOW to tell which scan is the precursor scan. In 
> >>> Thermo, figuring out the
> >>> precursor scan with certainty without parsing the scan event list (which 
> >>> comes in a fascinating
> >>> variety of formats) can be quite tricky. I don't know if the same 
> >>> problems exist in ABI and there's
> >>> no scan event list to check (AFAIK), so I punted.
>
> >>> -Matt
>
> >>> On 1/19/2011 11:34 AM, Nathan Edwards wrote:
>
> >>>> I've had this problem with a variety of tools and their handling .wiff
> >>>> data file from Analyst, and now having gotten msconvert to work
> >>>> (thanks Matt) I was hoping that msconvert did it "right".
>
> >>>> Unfortunately, it doesn't seem so.
>
> >>>> I believe that the scan number and retention times should increase
> >>>> monotonically in the mzXML file and in a tandem mass-spectrometry
> >>>> experiment, I expect the MS1 scan to be immediately followed by the MS/
> >>>> MS scans whose precursors are derived from the MS1 scan.
>
> >>>> A number (n>= 2) of converters (msconvert&    ABI's) for .wiff files do
> >>>> not respect this file structure and output the spectra by experiment
> >>>> and cycle, with all experiment 1 (MS1 spectra) first, then all
> >>>> experiment 2 (MS/MS from first selected precursor peak from each MS1
> >>>> spectrum), then all experiment 3, etc.
>
> >>>> In the msconvert mzXML output, there isn't even any reference in the
> >>>> MS2 spectra to assist in determining the correct MS1 spectrum to
> >>>> associate with the MS2 spectrum.
>
> >>>> It is possible to use various tricks to try and determine cycle,
> >>>> experiment, and MS1/MS2 relationships but at the least these require
> >>>> sorting (globally) on retentionTime, an expensive proposition for
> >>>> large mzXML files.
>
> >>>> I'd be happy to provide an example mzXML output to demonstrate the
> >>>> issue.
>
> >>>> - n

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

[spctools-discuss] Re: msconvert conversion of .wiff file to mzXML presents scans out of order...

Reply via email to