I actually found out that I left some data out of one of the fasta
files. So, you can disregard the descriptions of my runs I posted
before.

My question is probably more about SEQUEST behavior. But it also
includes the behavior of PeptideProphet.

My question is basically: Will using a large database_file (in fasta
format) affect the SEQUEST output or Peptide Prophet output in a
positive or negative way?

In other words, will a large database of sequences give more or less
valid peptide matches?

-Kris

On Jun 15, 11:13 am, "Brian Pratt" <[email protected]> wrote:
> Hi Kris,
>
> So is this a question about SEQUEST performance and behavior?  You might be
> better off asking Thermo about that.  On the other hand there are lots of
> SEQUEST users on this list so you might get an answer here too...
>
> Brian
>
> -----Original Message-----
> From: [email protected]
>
> [mailto:[email protected]] On Behalf Of Kris
> Sent: Monday, June 15, 2009 7:31 AM
> To: spctools-discuss
> Subject: [spctools-discuss] fasta file length and match precision
>
> Sorry if this is covered somewhere else, I was unable to find the
> answer to this question.
>
> I am using TPP with SEQUEST to search for peptides in mass spectra.
>
> I had an original fastafile (specified in the parameters file by
> "database_name=") that I have been using for all my database_search
> runs. I just generated a new fasta file that included the possibility
> of amino acids being cut off the N-terminus of the protein fragments.
>
> E.G.
> If the original fasta file contained MVMNDANQAQITATFKTK
>
> The new fasta file contains
> MVMNDANQAQITATFKTK
> VMNDANQAQITATFKTK
> MNDANQAQITATFKTK
> NDANQAQITATFKTK
> DANQAQITATFKTK
> ANQAQITATFKTK
> NQAQITATFKTK
> QAQITATFKTK
> AQITATFKTK
>
> These sequences were included because I was concerned that the
> database_search would not find proteins where the N-terminus was
> modified. (My work mainly concerns the N-termini of proteins).
>
> I ran some preliminary tests to see how this would affect the run
> speed and results.
>
> Using the original fasta file, a run took 3.5 hours. Using the new
> longer fasta file (which has all the sequences of the original file
> and more), the same run took 2 hours.
>
> There were about 6000 peptides found when run with the original fasta
> file and only about 2500 peptides found when run with the new longer
> fasta file.
>
> Using the new fasta file, I found some peptide matches with
> probability of 1 that had slightly lower probability (around 0.9) when
> using the original fasta file.
>
> MY QUESTIONS:
>
> Does using a longer fasta file somehow cause a lose of precision when
> searching for peptide matches in mass spec data?
>
> How does the fasta file length (and specifically adding the same
> sequences with a shortened N-terminus) affect the database_searches
> and more importantly the peptide matches output?
>
> Thank,
> Kris
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to