Dear David, Thank you for your insightful explanations. The raw file can be downloaded from PRIDE Archive using this link: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2016/12/PXD001441/20131226_HeLa_bRP01_120min.raw
For the conversion, I use the latest version of msconvert with default settings, but I do indeed exclude the MS1 scans for hard drive space concerns (The set of my mzMl files already weight around 3 TB without MS1 scans...) Thank you for your efforts, Thibault 2018-02-01 22:35 GMT+01:00 David Shteynberg < [email protected]>: > Thibault, > > I have a question about how you are generating the mzML files for > searching. What tool are you using for this task, and do you have the raw > file available for me to test? > > I have noticed that your files seem to be missing the MS1 scans. This > alone is fine but in order to identify the spectrum in the file we need > either the scan number or the index of the scan to pull up the correct > spectrum. It appears that the scan numbers are not listed in the index at > the end of you mzML file. The tail end of the mzML file should look like: > > <offset idRef="controllerType=0 controllerNumber=1 > scan=51126">481306970</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51127">481314790</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51128">481322971</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51129">481330551</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51130">481339052</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51131">481347643</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51132">481355682</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51133">481363528</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51134">481370856</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51135">481379107</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51136">481386761</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51137">481395077</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51138">481402922</offset> > <offset idRef="controllerType=0 controllerNumber=1 > scan=51139">481410705</offset> > </index> > <index name="chromatogram"> > <offset idRef="TIC">481418736</offset> > </index> > </indexList> > <indexListOffset>482010351</indexListOffset> > <fileChecksum>567fb08dd79b57d085a11680b48d666bdeacce73</fileChecksum> > </indexedmzML> > > > > The tail end of your mzML file looks like: > > > <offset idRef="index=26193">437068879</offset> > <offset idRef="index=26194">437074577</offset> > <offset idRef="index=26195">437079014</offset> > <offset idRef="index=26196">437084952</offset> > <offset idRef="index=26197">437091489</offset> > <offset idRef="index=26198">437096988</offset> > <offset idRef="index=26199">437103093</offset> > <offset idRef="index=26200">437108018</offset> > <offset idRef="index=26201">437113514</offset> > <offset idRef="index=26202">437120031</offset> > <offset idRef="index=26203">437126116</offset> > <offset idRef="index=26204">437131854</offset> > <offset idRef="index=26205">437137939</offset> > <offset idRef="index=26206">437144269</offset> > <offset idRef="index=26207">437151494</offset> > <offset idRef="index=26208">437156976</offset> > <offset idRef="index=26209">437161858</offset> > </index> > <index name="chromatogram"> > </index> > </indexList> > <indexListOffset>437167166</indexListOffset> > <fileChecksum>1fdc7999912fcb3e83a93d74c06f03dc3695005c</fileChecksum> > </indexedmzML> > > > > In your mzML file the TPP can only identify your spectra correctly by > their index (1-based). This suggests even using the old Jackhammer version > of X!Tandem pipeline, will not be able to extract the correct scan from the > mzML file. > > > For example, tandem refers to spectrum scan=9401 with a zero based index > of 5056. > > In the Jackhammer pepXML this spectrum is encoded as: > > <spectrum_query spectrum="20131226_HeLa_bRP01_120min.09401.09401.2" > start_scan="9401" end_scan="9401" precursor_neutral_mass="903.3941" > assumed_charge="2" index="5057" retention_time_sec="1966.72"> > > > Notice the 1-based index is 5057 or 5056+1. > > In order to be able to extract this scan from your mzML file as it stands > this spectrum should be encoded in pepXML with scan number 5057 or > > <spectrum_query spectrum="20131226_HeLa_bRP01_120min.05057.05057.2" > start_scan="5057" end_scan="5057" precursor_neutral_mass="903.3941" > assumed_charge="2" index="1" retention_time_sec="1966.722"> > > I have added some new code for Tandem2XML to allow it to refer to the > spectrum in the first or second version based on the options the user can > set but I want to first address the issue with mzML file. Once I > understand how your mzML file came to be I can recommend the best way > forward for properly processing this type of data. > > Thanks, > -David > > > > > > > > On Wed, Jan 31, 2018 at 11:31 PM, Thibault Robin <[email protected]> > wrote: > >> Dear David, >> >> Here is the tandem file produced using the tpp version of X!Tandem: >> https://www.dropbox.com/s/nm3lgs540urpl6o/20131226_HeLa_bRP0 >> 1_120min.tandem?dl=0 >> >> Hope it helps and thank you for your time. >> >> Cheers, >> >> Thibault >> >> -- >> You received this message because you are subscribed to the Google Groups >> "spctools-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/spctools-discuss. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "spctools-discuss" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/spctools-discuss/6srdMeRzmd8/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/spctools-discuss. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/d/optout.
