I actually found out that I left some data out of one of the fasta files. So, you can disregard the descriptions of my runs I posted before.
My question is probably more about SEQUEST behavior. But it also includes the behavior of PeptideProphet. My question is basically: Will using a large database_file (in fasta format) affect the SEQUEST output or Peptide Prophet output in a positive or negative way? In other words, will a large database of sequences give more or less valid peptide matches? -Kris On Jun 15, 11:13 am, "Brian Pratt" <[email protected]> wrote: > Hi Kris, > > So is this a question about SEQUEST performance and behavior? You might be > better off asking Thermo about that. On the other hand there are lots of > SEQUEST users on this list so you might get an answer here too... > > Brian > > -----Original Message----- > From: [email protected] > > [mailto:[email protected]] On Behalf Of Kris > Sent: Monday, June 15, 2009 7:31 AM > To: spctools-discuss > Subject: [spctools-discuss] fasta file length and match precision > > Sorry if this is covered somewhere else, I was unable to find the > answer to this question. > > I am using TPP with SEQUEST to search for peptides in mass spectra. > > I had an original fastafile (specified in the parameters file by > "database_name=") that I have been using for all my database_search > runs. I just generated a new fasta file that included the possibility > of amino acids being cut off the N-terminus of the protein fragments. > > E.G. > If the original fasta file contained MVMNDANQAQITATFKTK > > The new fasta file contains > MVMNDANQAQITATFKTK > VMNDANQAQITATFKTK > MNDANQAQITATFKTK > NDANQAQITATFKTK > DANQAQITATFKTK > ANQAQITATFKTK > NQAQITATFKTK > QAQITATFKTK > AQITATFKTK > > These sequences were included because I was concerned that the > database_search would not find proteins where the N-terminus was > modified. (My work mainly concerns the N-termini of proteins). > > I ran some preliminary tests to see how this would affect the run > speed and results. > > Using the original fasta file, a run took 3.5 hours. Using the new > longer fasta file (which has all the sequences of the original file > and more), the same run took 2 hours. > > There were about 6000 peptides found when run with the original fasta > file and only about 2500 peptides found when run with the new longer > fasta file. > > Using the new fasta file, I found some peptide matches with > probability of 1 that had slightly lower probability (around 0.9) when > using the original fasta file. > > MY QUESTIONS: > > Does using a longer fasta file somehow cause a lose of precision when > searching for peptide matches in mass spec data? > > How does the fasta file length (and specifically adding the same > sequences with a shortened N-terminus) affect the database_searches > and more importantly the peptide matches output? > > Thank, > Kris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
