This may not be the whole story, but in my experience highly redundant sequence databases tend to be "penalized" due to the parsimonious inference model of ProteinProphet.
If a peptide is proteotypic (only possible from one protein in the database), then all of its "weight" will apply to that protein. On the other hand, if a peptide could come from several proteins in the database, its "weight" is split among all the proteins it could have come from. This can lead to proteins falling below the FDR cutoff threshold. You can try using the GROUPWTS flag when running ProteinProphet, or do some pre-filtering of your protein database (as you've already done) to remove degenerate/redundant sequences. I've had some luck using the UniRef90 databases provided by UniProt. On Wednesday, October 30, 2013 3:25:31 AM UTC-7, ato wrote: > > Dear All, > > I am a new user of TPP software. I am trying to set up the protocol for > run. I want to run my sample with Uniprot database. I've compared the > results from the same files, the same settings but using two uniprot > databases generated in diferent way: > 1st database is a complete human proteome downloaded from the website_ > ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/proteomes/ > 2nd database was generated manually and contains only the reviewed > proteins (no isoforms, onlu canonical sequence data in FASTA format). > > In both cases I created the concatenative databases and I used this for > run in TPP. > > After running the protein prophet and filtering the list according to > Estimated sensitivity/ error rate (in both cases I keept only the protein > with the particular probability, which corresponds to one incorrect > proteins is 1). I compared the obtained list (enries) using Venn Diagram. > And I found that more than 200 proteins in unique when we run the sample > with Reviewed database. There is a lot of common proteins and among the > group unique for complete proteome are mostly isoforms, or unreviewed > protein of course. > > Can somebody explain me why we get more unique additional protein if we > run the sample with the reviewed/ shorter database? > > Thank you for your help! > > > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/groups/opt_out.
