Dear All, I have created decoy database for the latest uniprot_sprot_April2010 database by modifying the original Matrix Science decoy.pl script to get customized header format.
----------------------------------------------------------------------------------------------------------------- Case A: Here are few decoy entries in the database created with decoy.pl: ----------------------------------------------------------------------------------------------------------------- >sp|decoy_Q197F8|002R_IIV3 Reverse sequence, was Uncharacterized protein 002R OS=Invertebrate iridescent virus 3 GN=IIV3-002R PE=4 SV=1 CDDESDSDDDESDSDYEFDEDESNYNSDLSCFYRDISQIKKPKPKFGLSKMLNTITLKGK ELIRAAVSPEEETETEWDSDDSDPAWTPDDEDSSDEEGSSFSEDDFHENEPDSDSEYGTG DRNLQSIRLLFKYMEVQAPTKLPNNEVWDTDRKHLITSYMYRRHFKNDEVRYGHWEDEQF LVWVYIPRPVNEYRYSNTCYFTEINDTCWQVFPKNVGYTEFIYLYKLTSQMSKIGDMVPG FWRPHMDLNKITLHTLSPAWFDTFGCADEVMLHTLFELDDLGLDGHTNYNMDDIRRIVLR TVNPTTELIAKIEEARTDTFHIEQIKEPFAYERQFDEFTIASVRVVNDKYWRTWLWPHKC WSIQEWSLYQMIDLKIELPLLELSEVPGSYDESPESTQLEPYREALAQERNMKWPVISGP QENWIPDFLLFQAVDQINSFDRVPRNSGGQASVTNSAM >sp|decoy_Q197F7|003L_IIV3 Reverse sequence, was Uncharacterized protein 003L OS=Invertebrate iridescent virus 3 GN=IIV3-003L PE=4 SV=1 IGYTLPELRCTGYNNKRTKSNTILLRYTDPATNSTTSPRDSAVCECRQPSKASGFDFCLT GIRRPPNIHPPSDVMGLSTPPTCAALSPPTCTTLSPTTTLSRANLSTDFWAGGLANPHVP YYNPYHPAGSMKCVIERELQPSGYWSQPCPNIAQYM -------------------------------------------------------------------------------------------------------------------------- The original entries corresponding to the above entries are as follows ..... -------------------------------------------------------------------------------------------------------------------------- >sp|Q197F8|002R_IIV3 Uncharacterized protein 002R OS=Invertebrate iridescent virus 3 GN=IIV3-002R PE=4 SV=1 MASNTVSAQGGSNRPVRDFSNIQDVAQFLLFDPIWNEQPGSIVPWKMNREQALAERYPEL QTSEPSEDYSGPVESLELLPLEIKLDIMQYLSWEQISWCKHPWLWTRWYKDNVVRVSAIT FEDFQREYAFPEKIQEIHFTDTRAEEIKAILETTPNVTRLVIRRIDDMNYNTHGDLGLDD LEFLTHLMVEDACGFTDFWAPSLTHLTIKNLDMHPRWFGPVMDGIKSMQSTLKYLYIFET YGVNKPFVQWCTDNIETFYCTNSYRYENVPRPIYVWVLFQEDEWHGYRVEDNKFHRRYMY STILHKRDTDWVENNPLKTPAQVEMYKFLLRISQLNRDGTGYESDSDPENEHFDDESFSS GEEDSSDEDDPTWAPDSDDSDWETETEEEPSVAARILEKGKLTITNLMKSLGFKPKPKKI QSIDRYFCSLDSNYNSEDEDFEYDSDSEDDDSDSEDDC >sp|Q197F7|003L_IIV3 Uncharacterized protein 003L OS=Invertebrate iridescent virus 3 GN=IIV3-003L PE=4 SV=1 MYQAINPCPQSWYGSPQLEREIVCKMSGAPHYPNYYPVHPNALGGAWFDTSLNARSLTTT PSLTTCTPPSLAACTPPTSLGMVDSPPHINPPRRIGTLCFDFGSAKSPQRCECVASDRPS TTSNTAPDTYRLLITNSKTRKNNYGTCRLEPLTYGI -------------------------------------------------------------------------------------------- Case B: Here is the format decoy entries when I use decoyFasta of TPP ----------------------------------------------------------------------------------------------- >decoy_1 ............................... .............................. >decoy_2 ............................ ............................ ------------------------------------------------------------------------------------------------- I have used same parameters and input files for running OMSSA search and PeptideProphet but I notice the segmentation fault in the case A however PeptideProphet runs OK for the Case B. -------------------------------------------------------------------------------------------- Here is the STDOUT display for Case A( when I use decoy.pl) ------------------------------------------------------------------- [r...@apcf-hn3 jagan-J442]# /mnt/sanfs/APCF/APCF_WEB/tpp/bin/InteractParser 'jagan-J442.pepprophet.xml' 'jagan-J442.pep.xml' -L'7' -E'trypsin' -C -P file 1: jagan-J442.pep.xml processed altogether 123 results results written to file /mnt/sanfs/APCF/results/omssa/2010-08-02/jagan-J442/jagan-J442.pepprophet.shtml [r...@apcf-hn3 jagan-J442]# /mnt/sanfs/APCF/APCF_WEB/tpp/bin/PeptideProphetParser 'jagan-J442.pepprophet.xml' DECOY=decoy MINPROB=0 NONPARAM Using Decoy Label "decoy". Using non-parametric distributions (OMSSA) (minprob 0) WARNING!! The discriminant function for OMSSA is not yet complete. It is presented here to help facilitate trial and discussion. Reliance on this code for publishable scientific results is not recommended. init with OMSSA trypsin MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN PeptideProphet (TPP v4.4 JETSTREAM (unstable development prerelease) rev 0, Build 201007011135 (linux)) akel...@isb read in 0 1+, 78 2+, 45 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. Initialising statistical models ... Found 0 Decoys, and 123 Non-Decoys WARNING: No decoys with label decoy were found in this dataset. reverting to fully unsupervised method. Iterations: .........10.........20 Segmentation fault [r...@apcf-hn3 jagan-J442]# ----------------------------------------------------------------------------------------------------------- In Case B (decoyFasta of TPP has been used), here is the STDOUT .... ----------------------------------------------------------------------------------------------------------- [r...@apcf-hn3 jagan-J443]# /mnt/sanfs/APCF/APCF_WEB/tpp/bin/InteractParser 'jagan-J443.pepprophet.xml' 'jagan-J443.pep.xml' -L'7' -E'trypsin' -C -P file 1: jagan-J443.pep.xml processed altogether 123 results results written to file /mnt/sanfs/APCF/results/omssa/2010-08-02/jagan-J443/jagan-J443.pepprophet.shtml [r...@apcf-hn3 jagan-J443]# /mnt/sanfs/APCF/APCF_WEB/tpp/bin/PeptideProphetParser 'jagan-J443.pepprophet.xml' DECOY=decoy MINPROB=0 NONPARAM Using Decoy Label "decoy". Using non-parametric distributions (OMSSA) (minprob 0) WARNING!! The discriminant function for OMSSA is not yet complete. It is presented here to help facilitate trial and discussion. Reliance on this code for publishable scientific results is not recommended. init with OMSSA trypsin MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN PeptideProphet (TPP v4.4 JETSTREAM (unstable development prerelease) rev 0, Build 201007011135 (linux)) akel...@isb read in 0 1+, 78 2+, 45 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. Initialising statistical models ... Found 2 Decoys, and 121 Non-Decoys Iterations: .........10.........20..... WARNING: Mixture model quality test failed for charge (1+). WARNING: Mixture model quality test failed for charge (2+). WARNING: Mixture model quality test failed for charge (4+). WARNING: Mixture model quality test failed for charge (5+). WARNING: Mixture model quality test failed for charge (6+). WARNING: Mixture model quality test failed for charge (7+). model complete after 26 iterations [r...@apcf-hn3 jagan-J443]# ------------------------------------------------------------------------------------------------------------------------------ Which is the best way to make the Case A sematics to work with TPP pipeline .... Here is the difference in the contents of the pepXML files from the Case A to Case B ------------------------------------------------------------------------------------------------------------------------------ < date="2010-08-02T11:15:41" summary_xml="/home/APCF/omssa/results/2b78d709e0fc1276e3bdff7faa1c95a8/jagan-J443.pep.xml"> < <msms_run_summary base_name="/home/APCF/omssa/results/2b78d709e0fc1276e3bdff7faa1c95a8/jagan-j443_62928" raw_data_type="raw" raw_data=".mzXML"> --- > date="2010-08-02T10:52:40" summary_xml="/home/APCF/omssa/results/091b4664500d7e67d0eba75ef9170064/jagan-J442.pep.xml"> > <msms_run_summary base_name="/home/APCF/omssa/results/091b4664500d7e67d0eba75ef9170064/jagan-j442_62921" raw_data_type="raw" raw_data=".mzXML"> 11c11 < <search_summary base_name="/home/APCF/omssa/results/2b78d709e0fc1276e3bdff7faa1c95a8/jagan-j443_62928" search_engine="OMSSA" precursor_mass_type="monoisotopic" fragment_mass_type="monoisotopi c" out_data_type="n/a" out_data="n/a" search_id="1"> --- > <search_summary base_name="/home/APCF/omssa/results/091b4664500d7e67d0eba75ef9170064/jagan-j442_62921" search_engine="OMSSA" precursor_mass_type="monoisotopic" fragment_mass_type="monoisotopi c" out_data_type="n/a" out_data="n/a" search_id="1"> 17c17 < <search_hit hit_rank="1" peptide="KENNNNNNNK" peptide_prev_aa="K" peptide_next_aa="N" protein="285922" num_tot_proteins="1" num_matched_ions="16" tot_num_ions="18" calc_neutral_pep_mass=" 1201.545" massdiff="-0.862000000000007" is_rejected="0" protein_descr="sp|Q54UC0|PRKDC_DICDI DNA-dependent protein kinase catalytic subunit OS=Dictyostelium discoideum GN=dnapkcs PE=3 SV=2"> --- > <search_hit hit_rank="1" peptide="KENNNNNNNK" peptide_prev_aa="K" peptide_next_aa="N" protein="Q54UC0" num_tot_proteins="1" num_matched_ions="16" tot_num_ions="18" calc_neutral_pep_mass=" 1201.545" massdiff="-0.862000000000007" is_rejected="0" protein_descr="DNA-dependent protein kinase catalytic subunit OS=Dictyostelium discoideum GN=dnapkcs PE=3 SV=2"> 25c25 < <search_hit hit_rank="1" peptide="WQGHEGDIDK" peptide_prev_aa="K" peptide_next_aa="G" protein="132542" num_tot_proteins="1" num_matched_ions="13" tot_num_ions="18" calc_neutral_pep_mass=" 1183.526" massdiff="-0.000999999999909" is_rejected="0" protein_descr="sp|O95395|GCNT3_HUMAN Beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase 3 OS=Homo sapiens G N=GCNT3 PE=2 SV=1"> --- > <search_hit hit_rank="1" peptide="WQGHEGDIDK" peptide_prev_aa="K" peptide_next_aa="G" protein="O95395" num_tot_proteins="1" num_matched_ions="13" tot_num_ions="18" calc_neutral_pep_mass=" 1183.526" massdiff="-0.000999999999909" is_rejected="0" protein_descr="Beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase 3 OS=Homo sapiens GN=GCNT3 PE=2 SV=1"> 33,34c33,34 < <search_hit hit_rank="1" peptide="SKAEAESLYQSK" peptide_prev_aa="K" peptide_next_aa="Y" protein="183292" num_tot_proteins="2" num_matched_ions="19" tot_num_ions="22" calc_neutral_pep_mass ="1339.663" massdiff="-0.001999999999942" is_rejected="0" protein_descr="sp|P04264|K2C1_HUMAN Keratin, type II cytoskeletal 1 OS=Homo sapiens GN=KRT1 PE=1 SV=6"> < <alternative_protein protein="183294" protein_descr="sp|A5A6M6|K2C1_PANTR Keratin, type II cytoskeletal 1 OS=Pan troglodytes GN=KRT1 PE=2 SV=1"/> --- > <search_hit hit_rank="1" peptide="SKAEAESLYQSK" peptide_prev_aa="K" peptide_next_aa="Y" protein="P04264" num_tot_proteins="2" num_matched_ions="19" tot_num_ions="22" calc_neutral_pep_mass ="1339.663" massdiff="-0.001999999999942" is_rejected="0" protein_descr="Keratin, type II cytoskeletal 1 OS=Homo sapiens GN=KRT1 PE=1 SV=6"> > <alternative_protein protein="A5A6M6" protein_descr="Keratin, type II cytoskeletal 1 OS=Pan troglodytes GN=KRT1 PE=2 SV=1"/> 42c42 < <search_hit hit_rank="1" peptide="NQNESVSEIGGK" peptide_prev_aa="R" peptide_next_aa="I" protein="394680" num_tot_proteins="1" num_matched_ions="18" tot_num_ions="22" calc_neutral_pep_mass ="1260.595" massdiff="-0.001999999999925" is_rejected="0" protein_descr="sp|Q68CR1|SE1L3_HUMAN Protein sel-1 homolog 3 OS=Homo sapiens GN=SEL1L3 PE=1 SV=2"> --- > <search_hit hit_rank="1" peptide="NQNESVSEIGGK" peptide_prev_aa="R" peptide_next_aa="I" protein="Q68CR1" num_tot_proteins="1" num_matched_ions="18" tot_num_ions="22" calc_neutral_pep_mass ="1260.595" massdiff="-0.001999999999925" is_rejected="0" protein_descr="Protein sel-1 homolog 3 OS=Homo sapiens GN=SEL1L3 PE=1 SV=2"> 50c50 < <search_hit hit_rank="1" peptide="LVGATATSSPPPK" peptide_prev_aa="R" peptide_next_aa="A" protein="452919" num_tot_proteins="1" num_matched_ions="15" tot_num_ions="24" calc_neutral_pep_mas s="1224.672" massdiff="-0.002999999999904" is_rejected="0" protein_descr="sp|Q96QD9|UIF_HUMAN UAP56-interacting factor OS=Homo sapiens GN=FYTTD1 PE=1 SV=3"> --- > <search_hit hit_rank="1" peptide="LVGATATSSPPPK" peptide_prev_aa="R" peptide_next_aa="A" protein="Q96QD9" num_tot_proteins="1" num_matched_ions="15" tot_num_ions="24" calc_neutral_pep_mas s="1224.672" massdiff="-0.002999999999904" is_rejected="0" protein_descr="UAP56-interacting factor OS=Homo sapiens GN=FYTTD1 PE=1 SV=3"> 58c58 < <search_hit hit_rank="1" peptide="LHQDTFNQLHK" peptide_prev_aa="K" peptide_next_aa="V" protein="136270" num_tot_proteins="1" num_matched_ions="20" tot_num_ions="20" calc_neutral_pep_mass= "1379.696" massdiff="-0.003000000000016" is_rejected="0" protein_descr="sp|Q8NCI6|GLBL3_HUMAN Beta-galactosidase-1-like protein 3 OS=Homo sapiens GN=GLB1L3 PE=2 SV=3"> --- > <search_hit hit_rank="1" peptide="LHQDTFNQLHK" peptide_prev_aa="K" peptide_next_aa="V" protein="Q8NCI6" num_tot_proteins="1" num_matched_ions="20" tot_num_ions="20" calc_neutral_pep_mass= "1379.696" massdiff="-0.003000000000016" is_rejected="0" protein_descr="Beta-galactosidase-1-like protein 3 OS=Homo sapiens GN=GLB1L3 PE=2 SV=3"> 66c66 < <search_hit hit_rank="1" peptide="SVGLGTESTGR" peptide_prev_aa="R" peptide_next_aa="G" protein="136270" num_tot_proteins="1" num_matched_ions="17" tot_num_ions="20" calc_neutral_pep_mass= "1062.53" massdiff="0.000999999999949" is_rejected="0" protein_descr="sp|Q8NCI6|GLBL3_HUMAN Beta-galactosidase-1-like protein 3 OS=Homo sapiens GN=GLB1L3 PE=2 SV=3"> --- > <search_hit hit_rank="1" peptide="SVGLGTESTGR" peptide_prev_aa="R" peptide_next_aa="G" protein="Q8NCI6" num_tot_proteins="1" num_matched_ions="17" tot_num_ions="20" calc_neutral_pep_mass= "1062.53" massdiff="0.000999999999949" is_rejected="0" protein_descr="Beta-galactosidase-1-like protein 3 OS=Homo sapiens GN=GLB1L3 PE=2 SV=3"> 74c74 < <search_hit hit_rank="1" peptide="NQNESVSEIGGK" peptide_prev_aa="R" peptide_next_aa="I" protein="394680" num_tot_proteins="1" num_matched_ions="12" tot_num_ions="22" calc_neutral_pep_mass ="1260.595" massdiff="-0.006000000000058" is_rejected="0" protein_descr="sp|Q68CR1|SE1L3_HUMAN Protein sel-1 homolog 3 OS=Homo sapiens GN=SEL1L3 PE=1 SV=2"> --- > <search_hit hit_rank="1" peptide="NQNESVSEIGGK" peptide_prev_aa="R" peptide_next_aa="I" protein="Q68CR1" num_tot_proteins="1" num_matched_ions="12" tot_num_ions="22" calc_neutral_pep_mass ="1260.595" massdiff="-0.006000000000058" is_rejected="0" protein_descr="Protein sel-1 homolog 3 OS=Homo sapiens GN=SEL1L3 PE=1 SV=2"> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- with regards, Dr. Jagan Kommineni Ludwig Institute for Cancer research Pakville VIC 3145 Australia. -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
