Re: [spctools-discuss] Can PeptideProphet and iProphet succeed with few matched decoys?

Hannes Röst Fri, 04 Apr 2014 01:28:20 -0700

Dear Vadim

I just checked my xtandem ini files and we have used these parameters


  <note type="input" label="output, results">all</note>^M
    <note>values = all|valid|stochastic</note>^M
  <note type="input" label="output, maximum valid expectation
value">0.1</note>^M
    <note>value is used in the valid|stochastic setting of output,
results</note>^M

according to the documentation, this will write out all results
without any E value filtering. As far as I understand the
PeptideProphet algorithm, it is necessary to report all results if you
want to use PeptideProphet and very likely the modelling will not work
as expected when you perform any E value filtering. I would thus
suggest to also set the output, results value to "all". I have
attached the xml files that I currently use for your convenience.

If you are interested in how we created the spectral libraries for the
OpenSWATH paper, please consult the method section available in the
online supplementary. There we describe our searches and how we
converted the X!Tandem searches to SpectraST and then TraML files.

I hope that helps

Hannes


On 4 April 2014 01:24, Vadim Patsalo <[email protected]> wrote:
> Dear Hannes, thank you for your reply.
>
> By "filter," I assume you mean the "maximum valid expectation value" in the 
> output and refine settings of X!Tandem?
> If so, I've performed the following experiment.
>
> Evalue 1e-2:    8 decoys / 24280 non-decoys
> Evalue 1e-1:  105 decoys / 31902 non-decoys
> Evalue 1e0:   793 decoys / 38363 non-decoys
> Evalue 1e1:  3197 decoys / 42706 non-decoys
> Evalue 1e2:  4642 decoys / 44256 non-decoys
>
> Thank you for your advice -- I will attempt to build the Pos and Neg 
> distributions using the more liberal cutoffs.
> My goal is to obtain a SpectraST library to interrogate SWATH datasets, using 
> OpenSWATH, of course!
>
> What kind of decoy to non-decoy ratio is satisfactory for FDR modelling, in 
> your experience?
>
> Vadim
>
> On Apr 3, 2014, at 11:28 AM, Hannes Röst <[email protected]> wrote:
>
>> Hi Vadim
>>
>> This will most likely _not_ work. It would probably be better if you
>> do not filter before xinteract but give it the full X!Tandem output
>> and then filter afterwards based on the computed probabilities. This
>> way you also might increase the total number of retained hits at a
>> fixed FDR.
>>
>> Hannes
>
> --
> You received this message because you are subscribed to the Google Groups 
> "spctools-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/spctools-discuss.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="tandem-input-style.xsl"?>

<bioml>

<note>list path parameters</note>

	<note type="input" label="list path, default parameters">default_input.xml</note>

		<note>This value is ignored when it is present in the default parameter

		ist path.</note>

	<note type="input" label="list path, taxonomy information">taxonomy.xml</note>



<note>spectrum parameters</note>

	<note type="input" label="spectrum, fragment monoisotopic mass error">0.5</note>

	<note type="input" label="spectrum, parent monoisotopic mass error plus">3.0</note>

	<note type="input" label="spectrum, parent monoisotopic mass error minus">3.0</note>

	<note type="input" label="spectrum, parent monoisotopic mass isotope error">no</note>

	<note type="input" label="spectrum, fragment monoisotopic mass error units">Daltons</note>

	<note>The value for this parameter may be 'Daltons' or 'ppm': all other values are ignored</note>

	<note type="input" label="spectrum, parent monoisotopic mass error units">Daltons</note>

		<note>The value for this parameter may be 'Daltons' or 'ppm': all other values are ignored</note>

	<note type="input" label="spectrum, fragment mass type">monoisotopic</note>

		<note>values are monoisotopic|average </note>



<note>spectrum conditioning parameters</note>

	<note type="input" label="spectrum, dynamic range">1000.0</note>

		<note>The peaks read in are normalized so that the most intense peak

		is set to the dynamic range value. All peaks with values of less that

		1, using this normalization, are not used. This normalization has the

		overall effect of setting a threshold value for peak intensities.</note>

	<note type="input" label="spectrum, total peaks">50</note> 

		<note>If this value is 0, it is ignored. If it is greater than zero (lets say 50),

		then the number of peaks in the spectrum with be limited to the 50 most intense

		peaks in the spectrum. X! tandem does not do any peak finding: it only

		limits the peaks used by this parameter, and the dynamic range parameter.</note>

	<note type="input" label="spectrum, maximum parent charge">3</note>

	<note type="input" label="spectrum, use noise suppression">yes</note>

	<note type="input" label="spectrum, minimum parent m+h">300.0</note>

	<note type="input" label="spectrum, maximum parent m+h">4094.0</note>

	<note type="input" label="spectrum, minimum fragment mz">150.0</note>

	<note type="input" label="spectrum, minimum peaks">6</note> 

	<note type="input" label="spectrum, threads">1</note>

	

<note>residue modification parameters</note>

	<note type="input" label="residue, modification mass">57.021464@C</note>

		<note>The format of this parameter is m@X, where m is the modfication

		mass in Daltons and X is the appropriate residue to modify. Lists of

		modifications are separated by commas. For example, to modify M and C

		with the addition of 16.0 Daltons, the parameter line would be

		+16.0@M,+16.0@C

		Positive and negative values are allowed.

		</note>

	<note type="input" label="residue, potential modification mass">15.994915@M</note>

		<note>The format of this parameter is the same as the format

		for residue, modification mass (see above).</note>

	<note type="input" label="residue, potential modification motif"></note>

		<note>The format of this parameter is similar to residue, modification mass,

		with the addition of a modified PROSITE notation sequence motif specification.

		For example, a value of 80@[ST!]PX[KR] indicates a modification

		of either S or T when followed by P, and residue and the a K or an R.

		A value of 204@N!{P}[ST]{P} indicates a modification of N by 204, if it

		is NOT followed by a P, then either an S or a T, NOT followed by a P.

		Positive and negative values are allowed.

		</note>



<note>protein parameters</note>

	<note type="input" label="protein, taxon">no default</note>

		<note>This value is interpreted using the information in taxonomy.xml.</note>

	<note type="input" label="protein, cleavage site">[RK]|{P}</note>

		<note>this setting corresponds to the enzyme trypsin. The first characters

		in brackets represent residues N-terminal to the bond - the '|' pipe -

		and the second set of characters represent residues C-terminal to the

		bond. The characters must be in square brackets (denoting that only

		these residues are allowed for a cleavage) or french brackets (denoting

		that these residues cannot be in that position). Use UPPERCASE characters.

		To denote cleavage at any residue, use [X]|[X] and reset the 

		scoring, maximum missed cleavage site parameter (see below) to something like 50.

		</note>

	<note type="input" label="protein, N-terminal residue modification mass">0.0</note>

	<note type="input" label="protein, C-terminal residue modification mass">0.0</note>

	<note type="input" label="protein, homolog management">no</note>

		<note>if yes, an upper limit is set on the number of homologues kept for a particular spectrum</note>



<note>model refinement parameters</note>

		<note type="input" label="refine">no</note>

		<note type="input" label="refine, spectrum synthesis">yes</note>

		<note type="input" label="refine, maximum valid expectation value">0.1</note>

		<note type="input" label="refine, potential N-terminus modifications"></note>

		<note type="input" label="refine, potential C-terminus modifications"></note>

		<note type="input" label="refine, unanticipated cleavage">no</note>

		<note type="input" label="refine, cleavage semi">no</note>

		<note type="input" label="refine, point mutations">no</note>

		<note type="input" label="refine, use potential modifications for full refinement">yes</note>

		<note type="input" label="refine, potential modification motif"></note>

		<note>The format of this parameter is similar to residue, modification mass,

		with the addition of a modified PROSITE notation sequence motif specification.

		For example, a value of 80@[ST!]PX[KR] indicates a modification

		of either S or T when followed by P, and residue and the a K or an R.

		A value of 204@N!{P}[ST]{P} indicates a modification of N by 204, if it

		is NOT followed by a P, then either an S or a T, NOT followed by a P.

		Positive and negative values are allowed.

		</note>



<note>scoring parameters</note>

	<note type="input" label="scoring, minimum ion count">4</note>

	<note type="input" label="scoring, maximum missed cleavage sites">2</note>

	<note type="input" label="scoring, x ions">no</note>

	<note type="input" label="scoring, y ions">yes</note>

	<note type="input" label="scoring, z ions">no</note>

	<note type="input" label="scoring, a ions">no</note>

	<note type="input" label="scoring, b ions">yes</note>

	<note type="input" label="scoring, c ions">no</note>

	<note type="input" label="scoring, cyclic permutation">no</note>

		<note>if yes, cyclic peptide sequence permutation is used to pad the scoring histograms</note>

	<note type="input" label="scoring, include reverse">no</note>

		<note>if yes, then reversed sequences are searched at the same time as forward sequences</note>



<note>output parameters</note>

	<note type="input" label="output, message">testing 1 2 3</note>

	<note type="input" label="output, path">output.xml</note>

	<note type="input" label="output, sort results by">spectrum</note>

		<note>values = protein|spectrum (spectrum is the default)</note>

	<note type="input" label="output, path hashing">no</note>

		<note>values = yes|no</note>

	<note type="input" label="output, xsl path">tandem-style.xsl</note>

	<note type="input" label="output, parameters">yes</note>

		<note>values = yes|no</note>

	<note type="input" label="output, performance">yes</note>

		<note>values = yes|no</note>

	<note type="input" label="output, spectra">yes</note>

		<note>values = yes|no</note>

	<note type="input" label="output, histograms">no</note>

		<note>values = yes|no</note>

	<note type="input" label="output, proteins">yes</note>

		<note>values = yes|no</note>

	<note type="input" label="output, sequences">no</note>

		<note>values = yes|no</note>

	<note type="input" label="output, one sequence copy">yes</note>

		<note>values = yes|no, set to yes to produce only one copy of each protein sequence in the output xml</note>

	<note type="input" label="output, results">all</note>

		<note>values = all|valid|stochastic</note>

	<note type="input" label="output, maximum valid expectation value">0.1</note>

		<note>value is used in the valid|stochastic setting of output, results</note>

	<note type="input" label="output, histogram column width">30</note>

		<note>values any integer greater than 0. Setting this to '1' makes cutting and pasting histograms

		into spread sheet programs easier.</note>

<note type="description">ADDITIONAL EXPLANATIONS</note>

	<note type="description">Each one of the parameters for X! tandem is entered as a labeled note

			node. In the current version of X!, keep those note nodes

			on a single line.

	</note>

	<note type="description">The presence of the type 'input' is necessary if a note is to be considered

			an input parameter.

	</note>

	<note type="description">Any of the parameters that are paths to files may require alteration for a 

			particular installation. Full path names usually cause the least trouble,

			but there is no reason not to use relative path names, if that is the

			most convenient.

	</note>

	<note type="description">Any parameter values set in the 'list path, default parameters' file are

			reset by entries in the normal input file, if they are present. Otherwise,

			the default set is used.

	</note>

	<note type="description">The 'list path, taxonomy information' file must exist.

		</note>

	<note type="description">The directory containing the 'output, path' file must exist: it will not be created.

		</note>

	<note type="description">The 'output, xsl path' is optional: it is only of use if a good XSLT style sheet exists.

		</note>



</bioml>

<?xml version="1.0"?>
<bioml>
	<note>
	Each one of the parameters for x! tandem is entered as a labeled note node. 
	Any of the entries in the default_input.xml file can be over-ridden by
	adding a corresponding entry to this file. This file represents a minimum
	input file, with only entries for the default settings, the output file
	and the input spectra file name. 
	See the taxonomy.xml file for a description of how FASTA sequence list 
	files are linked to a taxon name.
	</note>

	<note type="input" label="spectrum, parent monoisotopic mass error plus">50</note>
	<note type="input" label="spectrum, parent monoisotopic mass error minus">50</note>
	<note type="input" label="spectrum, parent monoisotopic mass error units">ppm</note>


	<note type="input" label="list path, default parameters">default_input.xml</note>
	<note type="input" label="list path,  taxonomy information">taxonomy.xml</note>
	<note type="input" label="protein, taxon">spyo_sprot</note>
	
	<note type="input" label="residue, potential modification mass">15.994915@M</note>

	<note type="input" label="spectrum, path">mzXMLs/hroest_L120218_.mzXML</note>

    <note type="input" label="output, path">test_out.xml</note>
	<note label="scoring, algorithm" type="input">k-score</note>
	<note label="spectrum, use conditioning" type="input">no</note>
	<note label="scoring, minimum ion count" type="input">1</note>
</bioml>

Re: [spctools-discuss] Can PeptideProphet and iProphet succeed with few matched decoys?

Reply via email to