Dear David,

Thank you for your insightful explanations. The raw file can be downloaded
from PRIDE Archive using this link:
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2016/12/PXD001441/20131226_HeLa_bRP01_120min.raw

For the conversion, I use the latest version of msconvert with default
settings, but I do indeed exclude the MS1 scans for hard drive space
concerns (The set of my mzMl files already weight around 3 TB without MS1
scans...)

Thank you for your efforts,

Thibault




2018-02-01 22:35 GMT+01:00 David Shteynberg <
[email protected]>:

> Thibault,
>
> I have a question about how you are generating the mzML files for
> searching.  What tool are you using for this task, and do you have the raw
> file available for me to test?
>
> I have noticed that your files seem to be missing the MS1 scans.  This
> alone is fine but in order to identify the spectrum in the file we need
> either the scan number or the index of the scan to pull up the correct
> spectrum.  It appears that the scan numbers are not listed in the index at
> the end of you mzML file.  The tail end of the mzML file should look like:
>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51126">481306970</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51127">481314790</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51128">481322971</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51129">481330551</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51130">481339052</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51131">481347643</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51132">481355682</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51133">481363528</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51134">481370856</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51135">481379107</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51136">481386761</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51137">481395077</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51138">481402922</offset>
>       <offset idRef="controllerType=0 controllerNumber=1 
> scan=51139">481410705</offset>
>     </index>
>     <index name="chromatogram">
>       <offset idRef="TIC">481418736</offset>
>     </index>
>   </indexList>
>   <indexListOffset>482010351</indexListOffset>
>   <fileChecksum>567fb08dd79b57d085a11680b48d666bdeacce73</fileChecksum>
> </indexedmzML>
>
>
>
> The tail end of your mzML file looks like:
>
>
>       <offset idRef="index=26193">437068879</offset>
>       <offset idRef="index=26194">437074577</offset>
>       <offset idRef="index=26195">437079014</offset>
>       <offset idRef="index=26196">437084952</offset>
>       <offset idRef="index=26197">437091489</offset>
>       <offset idRef="index=26198">437096988</offset>
>       <offset idRef="index=26199">437103093</offset>
>       <offset idRef="index=26200">437108018</offset>
>       <offset idRef="index=26201">437113514</offset>
>       <offset idRef="index=26202">437120031</offset>
>       <offset idRef="index=26203">437126116</offset>
>       <offset idRef="index=26204">437131854</offset>
>       <offset idRef="index=26205">437137939</offset>
>       <offset idRef="index=26206">437144269</offset>
>       <offset idRef="index=26207">437151494</offset>
>       <offset idRef="index=26208">437156976</offset>
>       <offset idRef="index=26209">437161858</offset>
>     </index>
>     <index name="chromatogram">
>     </index>
>   </indexList>
>   <indexListOffset>437167166</indexListOffset>
>   <fileChecksum>1fdc7999912fcb3e83a93d74c06f03dc3695005c</fileChecksum>
> </indexedmzML>
>
>
>
> In your mzML file the TPP can only identify your spectra correctly by
> their index (1-based).  This suggests even using the old Jackhammer version
> of X!Tandem pipeline, will not be able to extract the correct scan from the
> mzML file.
>
>
> For example, tandem refers to spectrum scan=9401 with a zero based index
> of 5056.
>
> In the Jackhammer pepXML this spectrum is encoded as:
>
> <spectrum_query spectrum="20131226_HeLa_bRP01_120min.09401.09401.2"
> start_scan="9401" end_scan="9401" precursor_neutral_mass="903.3941"
> assumed_charge="2" index="5057" retention_time_sec="1966.72">
>
>
> Notice the 1-based index is 5057 or 5056+1.
>
> In order to be able to extract this scan from your mzML file as it stands
> this spectrum should be encoded in pepXML with scan number 5057 or
>
>  <spectrum_query spectrum="20131226_HeLa_bRP01_120min.05057.05057.2"
> start_scan="5057" end_scan="5057" precursor_neutral_mass="903.3941"
> assumed_charge="2" index="1" retention_time_sec="1966.722">
>
> I have added some new code for Tandem2XML to allow it to refer to the
> spectrum in the first or second version based on the options the user can
> set but I want to first address the issue with mzML file.  Once I
> understand how your mzML file came to be I can recommend the best way
> forward for properly processing this type of data.
>
> Thanks,
> -David
>
>
>
>
>
>
>
> On Wed, Jan 31, 2018 at 11:31 PM, Thibault Robin <[email protected]>
> wrote:
>
>> Dear David,
>>
>> Here is the tandem file produced using the tpp version of X!Tandem:
>> https://www.dropbox.com/s/nm3lgs540urpl6o/20131226_HeLa_bRP0
>> 1_120min.tandem?dl=0
>>
>> Hope it helps and thank you for your time.
>>
>> Cheers,
>>
>> Thibault
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "spctools-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/spctools-discuss.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "spctools-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/spctools-discuss/6srdMeRzmd8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/spctools-discuss.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Reply via email to