Re: [spctools-discuss] Re: msconvert conversion of .wiff file to mzXML presents scans out of order...

Matthew Chambers Wed, 19 Jan 2011 13:21:23 -0800

What is the corresponding nativeID for the index 1034? I'd look up that cycle and see what Analysthas there.


-Matt



On 1/19/2011 3:15 PM, Nathan Edwards wrote:

  What do you mean validation?

I have a copy of Analyst installed. I'm confident it doesn't have two
scans at the same retention time.

- n

On Jan 19, 3:01 pm, Matthew Chambers<[email protected]>
wrote:

The validation would have to come from Analyst I suspect. Do you have an 
installation you can look
at these cycles with? If there's a discrepancy between Analyst and 
WiffFileDataReader and there's a
feasible way to work hack around it, that could be done.

-Matt

On 1/19/2011 1:49 PM, Nathan Edwards wrote:

Retention time 1513.85 is mentioned twice, I don't know how to
interpret this. I think MS1 scan num 1034 is empty in the wiff file
(gets the retention time of next spectrum as it takes no time to
collect), and its spectrum is a carry over from scan 1033 (notice that
the spectrum mz metadata is identical for basePeakMz, startMz, and
endMz, very unlikely). Base 64 spectral data for 1033 and 1034 are
identical too. That said, why are we taking a MS2 spectrum (1608) if
we don't take a MS1 spectrum prior to it?

- n

On Jan 19, 2:25 pm, Nathan Edwards<[email protected]>    wrote:

Ugh. I was worried it was due to efficiency issues with the vendor
API.

Sigh. Regardless of whether the scan numbers are real or made-up, I
think that the non-chronological order of the scans in the file is an
issue. I suspect others will be surprised by this too.

At the time of conversion it is possible to read in one way and write
in another without having to resort globally (read from # experiments
"caches" in turn) but without an experiment annotation in the spectra
metadata, a global retentionTime sort is the only robust alternative I
can think of (though linear time merge sort for # experiments
monotonicly increasing runs is doable, I guess). There are
retentionTime repeats (empty spectra before the real spectrum with the
correct retention time). More about this next.

How can I detect that the retention time is not monotonic without
reading a large chunk of the file? I guess I can look for a magic
string in the first 1K of the file (.wiff, Analyst) to decide whether
to do this expensive check, and fix.

Without explicit information in the .wiff file data structure,
formally determining the precursor scan may not be possible, but the
"cycle,experiment" grouping (as opposed to experiment,cycle) will
capture the right relationships by chronology for the vast majority of
LC-MS/MS datasets.

- n

On Jan 19, 12:49 pm, Matthew Chambers<[email protected]>
wrote:

I am well aware of this issue, but there's no schematic rule about the file 
being in retention time
order. And there is no scan number for a WIFF scan (since it uses the arbitrary 
index that pwiz
translates to, that part at least actually does increase monotonically). Use 
mzML and nativeID! :)

The problem is the WiffFileReader API takes a relative eternity to switch 
between experiments. It's
quite slow enough as it is. :) You'll be happy to hear that the new API does 
not have the same
problem. With the current API it would be faster (except possibly with huge 
profile data) to first
convert to XML and then use a sorting filter to convert the XML to another file 
sorted by retention
time. Currently there is a sorting filter, but no built-in predicates that use 
it are accessible
from the command-line.

I'm not actually sure HOW to tell which scan is the precursor scan. In Thermo, 
figuring out the
precursor scan with certainty without parsing the scan event list (which comes 
in a fascinating
variety of formats) can be quite tricky. I don't know if the same problems 
exist in ABI and there's
no scan event list to check (AFAIK), so I punted.

-Matt

On 1/19/2011 11:34 AM, Nathan Edwards wrote:

I've had this problem with a variety of tools and their handling .wiff
data file from Analyst, and now having gotten msconvert to work
(thanks Matt) I was hoping that msconvert did it "right".

Unfortunately, it doesn't seem so.

I believe that the scan number and retention times should increase
monotonically in the mzXML file and in a tandem mass-spectrometry
experiment, I expect the MS1 scan to be immediately followed by the MS/
MS scans whose precursors are derived from the MS1 scan.

A number (n>= 2) of converters (msconvert&      ABI's) for .wiff files do
not respect this file structure and output the spectra by experiment
and cycle, with all experiment 1 (MS1 spectra) first, then all
experiment 2 (MS/MS from first selected precursor peak from each MS1
spectrum), then all experiment 3, etc.

In the msconvert mzXML output, there isn't even any reference in the
MS2 spectra to assist in determining the correct MS1 spectrum to
associate with the MS2 spectrum.

It is possible to use various tricks to try and determine cycle,
experiment, and MS1/MS2 relationships but at the least these require
sorting (globally) on retentionTime, an expensive proposition for
large mzXML files.

I'd be happy to provide an example mzXML output to demonstrate the
issue.

- n


--
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

Re: [spctools-discuss] Re: msconvert conversion of .wiff file to mzXML presents scans out of order...

Reply via email to