Re: [spctools-discuss] Problems with PTMProphet and Tandem2XML

David Shteynberg Mon, 05 Feb 2018 09:40:20 -0800

Hello Thibault,

I converted the raw file on our system and got a file with correct
references in the mzML result.  I started to compare your file to my file
and noticed that your file seems to have been created from an mgf input
file.  Can you try your process again and convert to mzML starting from the
raw file instead?  Here are the msconvert parameters we use:


 --mzML --filter "peakPicking true [1,2]" -z --32


The software versions we currently use are:

 <softwareList count="2">
      <software id="Xcalibur" version="2.2-164600/2.2.1.1646">
        <cvParam cvRef="MS" accession="MS:1000532" name="Xcalibur"
value=""/>
      </software>
      <software id="pwiz" version="3.0.11516">
        <cvParam cvRef="MS" accession="MS:1000615" name="ProteoWizard
software" value=""/>
      </software>
    </softwareList>


Thanks,
-David


On Thu, Feb 1, 2018 at 1:47 PM, Thibault Robin <[email protected]> wrote:

> Dear David,
>
> Thank you for your insightful explanations. The raw file can be downloaded
> from PRIDE Archive using this link:
> ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2016/12/
> PXD001441/20131226_HeLa_bRP01_120min.raw
>
> For the conversion, I use the latest version of msconvert with default
> settings, but I do indeed exclude the MS1 scans for hard drive space
> concerns (The set of my mzMl files already weight around 3 TB without MS1
> scans...)
>
> Thank you for your efforts,
>
> Thibault
>
>
>
>
> 2018-02-01 22:35 GMT+01:00 David Shteynberg <David.Shteynberg@
> systemsbiology.org>:
>
>> Thibault,
>>
>> I have a question about how you are generating the mzML files for
>> searching.  What tool are you using for this task, and do you have the raw
>> file available for me to test?
>>
>> I have noticed that your files seem to be missing the MS1 scans.  This
>> alone is fine but in order to identify the spectrum in the file we need
>> either the scan number or the index of the scan to pull up the correct
>> spectrum.  It appears that the scan numbers are not listed in the index at
>> the end of you mzML file.  The tail end of the mzML file should look like:
>>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51126">481306970</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51127">481314790</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51128">481322971</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51129">481330551</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51130">481339052</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51131">481347643</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51132">481355682</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51133">481363528</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51134">481370856</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51135">481379107</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51136">481386761</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51137">481395077</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51138">481402922</offset>
>>       <offset idRef="controllerType=0 controllerNumber=1 
>> scan=51139">481410705</offset>
>>     </index>
>>     <index name="chromatogram">
>>       <offset idRef="TIC">481418736</offset>
>>     </index>
>>   </indexList>
>>   <indexListOffset>482010351</indexListOffset>
>>   <fileChecksum>567fb08dd79b57d085a11680b48d666bdeacce73</fileChecksum>
>> </indexedmzML>
>>
>>
>>
>> The tail end of your mzML file looks like:
>>
>>
>>       <offset idRef="index=26193">437068879</offset>
>>       <offset idRef="index=26194">437074577</offset>
>>       <offset idRef="index=26195">437079014</offset>
>>       <offset idRef="index=26196">437084952</offset>
>>       <offset idRef="index=26197">437091489</offset>
>>       <offset idRef="index=26198">437096988</offset>
>>       <offset idRef="index=26199">437103093</offset>
>>       <offset idRef="index=26200">437108018</offset>
>>       <offset idRef="index=26201">437113514</offset>
>>       <offset idRef="index=26202">437120031</offset>
>>       <offset idRef="index=26203">437126116</offset>
>>       <offset idRef="index=26204">437131854</offset>
>>       <offset idRef="index=26205">437137939</offset>
>>       <offset idRef="index=26206">437144269</offset>
>>       <offset idRef="index=26207">437151494</offset>
>>       <offset idRef="index=26208">437156976</offset>
>>       <offset idRef="index=26209">437161858</offset>
>>     </index>
>>     <index name="chromatogram">
>>     </index>
>>   </indexList>
>>   <indexListOffset>437167166</indexListOffset>
>>   <fileChecksum>1fdc7999912fcb3e83a93d74c06f03dc3695005c</fileChecksum>
>> </indexedmzML>
>>
>>
>>
>> In your mzML file the TPP can only identify your spectra correctly by
>> their index (1-based).  This suggests even using the old Jackhammer version
>> of X!Tandem pipeline, will not be able to extract the correct scan from the
>> mzML file.
>>
>>
>> For example, tandem refers to spectrum scan=9401 with a zero based index
>> of 5056.
>>
>> In the Jackhammer pepXML this spectrum is encoded as:
>>
>> <spectrum_query spectrum="20131226_HeLa_bRP01_120min.09401.09401.2"
>> start_scan="9401" end_scan="9401" precursor_neutral_mass="903.3941"
>> assumed_charge="2" index="5057" retention_time_sec="1966.72">
>>
>>
>> Notice the 1-based index is 5057 or 5056+1.
>>
>> In order to be able to extract this scan from your mzML file as it stands
>> this spectrum should be encoded in pepXML with scan number 5057 or
>>
>>  <spectrum_query spectrum="20131226_HeLa_bRP01_120min.05057.05057.2"
>> start_scan="5057" end_scan="5057" precursor_neutral_mass="903.3941"
>> assumed_charge="2" index="1" retention_time_sec="1966.722">
>>
>> I have added some new code for Tandem2XML to allow it to refer to the
>> spectrum in the first or second version based on the options the user can
>> set but I want to first address the issue with mzML file.  Once I
>> understand how your mzML file came to be I can recommend the best way
>> forward for properly processing this type of data.
>>
>> Thanks,
>> -David
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 31, 2018 at 11:31 PM, Thibault Robin <[email protected]>
>> wrote:
>>
>>> Dear David,
>>>
>>> Here is the tandem file produced using the tpp version of X!Tandem:
>>> https://www.dropbox.com/s/nm3lgs540urpl6o/20131226_HeLa_bRP0
>>> 1_120min.tandem?dl=0
>>>
>>> Hope it helps and thank you for your time.
>>>
>>> Cheers,
>>>
>>> Thibault
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "spctools-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/spctools-discuss.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "spctools-discuss" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>> pic/spctools-discuss/6srdMeRzmd8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/spctools-discuss.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "spctools-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/spctools-discuss.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Re: [spctools-discuss] Problems with PTMProphet and Tandem2XML

Reply via email to