What do you mean validation? I have a copy of Analyst installed. I'm confident it doesn't have two scans at the same retention time.
- n On Jan 19, 3:01 pm, Matthew Chambers <[email protected]> wrote: > The validation would have to come from Analyst I suspect. Do you have an > installation you can look > at these cycles with? If there's a discrepancy between Analyst and > WiffFileDataReader and there's a > feasible way to work hack around it, that could be done. > > -Matt > > On 1/19/2011 1:49 PM, Nathan Edwards wrote: > > > > > Retention time 1513.85 is mentioned twice, I don't know how to > > interpret this. I think MS1 scan num 1034 is empty in the wiff file > > (gets the retention time of next spectrum as it takes no time to > > collect), and its spectrum is a carry over from scan 1033 (notice that > > the spectrum mz metadata is identical for basePeakMz, startMz, and > > endMz, very unlikely). Base 64 spectral data for 1033 and 1034 are > > identical too. That said, why are we taking a MS2 spectrum (1608) if > > we don't take a MS1 spectrum prior to it? > > > - n > > > On Jan 19, 2:25 pm, Nathan Edwards<[email protected]> wrote: > >> Ugh. I was worried it was due to efficiency issues with the vendor > >> API. > > >> Sigh. Regardless of whether the scan numbers are real or made-up, I > >> think that the non-chronological order of the scans in the file is an > >> issue. I suspect others will be surprised by this too. > > >> At the time of conversion it is possible to read in one way and write > >> in another without having to resort globally (read from # experiments > >> "caches" in turn) but without an experiment annotation in the spectra > >> metadata, a global retentionTime sort is the only robust alternative I > >> can think of (though linear time merge sort for # experiments > >> monotonicly increasing runs is doable, I guess). There are > >> retentionTime repeats (empty spectra before the real spectrum with the > >> correct retention time). More about this next. > > >> How can I detect that the retention time is not monotonic without > >> reading a large chunk of the file? I guess I can look for a magic > >> string in the first 1K of the file (.wiff, Analyst) to decide whether > >> to do this expensive check, and fix. > > >> Without explicit information in the .wiff file data structure, > >> formally determining the precursor scan may not be possible, but the > >> "cycle,experiment" grouping (as opposed to experiment,cycle) will > >> capture the right relationships by chronology for the vast majority of > >> LC-MS/MS datasets. > > >> - n > > >> On Jan 19, 12:49 pm, Matthew Chambers<[email protected]> > >> wrote: > > >>> I am well aware of this issue, but there's no schematic rule about the > >>> file being in retention time > >>> order. And there is no scan number for a WIFF scan (since it uses the > >>> arbitrary index that pwiz > >>> translates to, that part at least actually does increase monotonically). > >>> Use mzML and nativeID! :) > > >>> The problem is the WiffFileReader API takes a relative eternity to switch > >>> between experiments. It's > >>> quite slow enough as it is. :) You'll be happy to hear that the new API > >>> does not have the same > >>> problem. With the current API it would be faster (except possibly with > >>> huge profile data) to first > >>> convert to XML and then use a sorting filter to convert the XML to > >>> another file sorted by retention > >>> time. Currently there is a sorting filter, but no built-in predicates > >>> that use it are accessible > >>> from the command-line. > > >>> I'm not actually sure HOW to tell which scan is the precursor scan. In > >>> Thermo, figuring out the > >>> precursor scan with certainty without parsing the scan event list (which > >>> comes in a fascinating > >>> variety of formats) can be quite tricky. I don't know if the same > >>> problems exist in ABI and there's > >>> no scan event list to check (AFAIK), so I punted. > > >>> -Matt > > >>> On 1/19/2011 11:34 AM, Nathan Edwards wrote: > > >>>> I've had this problem with a variety of tools and their handling .wiff > >>>> data file from Analyst, and now having gotten msconvert to work > >>>> (thanks Matt) I was hoping that msconvert did it "right". > > >>>> Unfortunately, it doesn't seem so. > > >>>> I believe that the scan number and retention times should increase > >>>> monotonically in the mzXML file and in a tandem mass-spectrometry > >>>> experiment, I expect the MS1 scan to be immediately followed by the MS/ > >>>> MS scans whose precursors are derived from the MS1 scan. > > >>>> A number (n>= 2) of converters (msconvert& ABI's) for .wiff files do > >>>> not respect this file structure and output the spectra by experiment > >>>> and cycle, with all experiment 1 (MS1 spectra) first, then all > >>>> experiment 2 (MS/MS from first selected precursor peak from each MS1 > >>>> spectrum), then all experiment 3, etc. > > >>>> In the msconvert mzXML output, there isn't even any reference in the > >>>> MS2 spectra to assist in determining the correct MS1 spectrum to > >>>> associate with the MS2 spectrum. > > >>>> It is possible to use various tricks to try and determine cycle, > >>>> experiment, and MS1/MS2 relationships but at the least these require > >>>> sorting (globally) on retentionTime, an expensive proposition for > >>>> large mzXML files. > > >>>> I'd be happy to provide an example mzXML output to demonstrate the > >>>> issue. > > >>>> - n -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
