Hi Ludovic,
On 2/11/2011 2:19 AM, lgillet wrote:
Dear Matt,
My problem (and others' from our lab) is that, with the current
version of msconvert, you almost cannot do anything with the converted
Agilent data. For example, MzXML2Search splits out a "segmentation
fault" error message as soon as one scan number exceed 27'219 (i.e. if
scan>27'220 it crashes; this probably has something to do with single/
double integers stuff?). Second, our Sequest server (Sage-Sorcerer)
also crashes on those files (the number of .dta files created from the
mzXML are again very much limited to a well defined scan number limit
and therefore very few spectra are actually searched).
If this is true of MzXML2Search then it's a bug. I thought it was fixed actually. Thermo Velos
instruments easily exceed 30000 spectra. And if it's LTQ, you double that to 60000 (DTAs).
I don't know if I make myself clear but here are my comments:
1) could you verify why msconvert is behaving differently than Trapper
(while they supposedly use the same Agilent libraries) when exporting
the scan numbers (Trapper performing the correct conversion by
conserving the same scan numbering as the raw file)
>
> I do not know what the Agilent API does or not
> to the data, but what I can tell is that the scan are indeed
> *consecutively* numbered (from 1 till 5'000 or more) in the raw data
> when you browse them with the Agilent MassHunter Qual software. So my
> guess is that there might still be something fishy about msconvert
> here. My understanding was that the former converter (Trapper from
> Natalie Tasman) was actually relying on the same Agilent API as well!
> Maybe Natalie could comment on that. And since Trapper was conserving
> the proper numbering of the scan as in the raw data, something might
> have changed upon switching to msconvert.
I'll quote my post to the psidev-ms mailing list from 6/30/2009:
In the MassHunter API there are two ways to uniquely address a spectrum: by
"row number" or "scan id". Row number is essentially a 0-based index
that refers to the spectra after the acquisition software has done
something...perhaps internal merging? Scan id represents the ordinal
number of acquisitions as they come off the instrument. So, at least on
their (Q)TOF instruments, the rowNumber is very disparate from the
scanId, but both of them are unique identifiers that can technically be
used to refer to a native spectrum. The kink is that the MassHunter API
only refers to the parent scan by its scan id and doesn't provide a way
to directly translate a scan id to a row number - translation must be
done indirectly by enumerating all the row numbers and building a
mapping of scan id to row number. For this reason I would recommend that
the nativeID format be defined as "scanId=xsd:nonNegativeInteger" but
I'm open to comment on this!
This explains why we adopted scanId to be used as the nativeID despite it not being consecutive. It
was not a strong reason for choosing one over the other, but ids being consecutive means even less.
However, if it's true that it's impossible to find a scan in MassHunter with the scanId, that's a
major issue of which I was unaware! That's a pretty compelling reason to switch to the row number,
but we've never had to change a nativeID format before. We'll have to discuss it with Agilent and
the PSI-MS working group.
2) If that's not possible for you to fix msconvert in that respect,
would it be possible to provide an option in msconvert in order to
renumber the scan consecutively from 1 till the end. I guess such
option may anyway one day be useful for other people for other
applications.
Yes, it's possible to implement this, but as I said above there is an imminent problem with your
pipeline if you can't support scan numbers over 27219. I have no idea why that number would be a
threshold; 32767 is the max for a signed 16-bit integer and 65535 is the max for unsigned. This
should be an easy bug to fix too (just changing the scan number data type). If the 16-bit integer
problems are fixed, is the consecutive option still necessary?
Hope this helps,
-Matt
--
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/spctools-discuss?hl=en.