[spctools-discuss] Re: protein prophet can't find protein in database

bill Tue, 17 Mar 2009 08:29:52 -0700
Thanks for taking a look. I'll post something if I find a fix.

On Mar 16, 6:23 pm, "Brian Pratt" <brian.pr...@insilicos.com> wrote:
> Well, the bad news is that I can't make it fail here.  
>
> I do notice that your fasta file size just passes the 4GB (2^32) mark, which
> makes me wonder a bit.  That should not matter since the file is read
> sequentially (there's no random access, which is usually where the problems
> arise on large files) but it is possible that your compiler and/or libraries
> are in need of updating.  Just a guess, I'm afraid, but the best I can do
> since I can't reproduce the problem.
>
> -----Original Message-----
> From: spctools-discuss@googlegroups.com
>
> [mailto:spctools-disc...@googlegroups.com] On Behalf Of bill
> Sent: Monday, March 16, 2009 6:46 AM
> To: spctools-discuss
> Subject: [spctools-discuss] Re: protein prophet can't find protein in
> database
>
> Thanks for your help.
>
> On Mar 13, 4:22 pm, Insilicos <brian.pratt.insili...@gmail.com> wrote:
> > Looks like this is a Monday task now... Sorry.
>
> > Brian
>
> > On Mar 12, 2009, at 2:41 PM, bill <nelson...@gmail.com> wrote:
>
> > > much obliged.
> > > It's on its way as kentucky.tar. Not name correctly though I
> > > compressed with tar -czf .
> > > Thanks,
> > > Bill
>
> > > On Mar 12, 4:53 pm, "Brian Pratt" <brian.pr...@insilicos.com> wrote:
> > >> That message is coming from the "batchcoverage" program which gets  
> > >> invoked
> > >> by Protein Prophet.  It parses that .covinfo tempfile.
>
> > >> Perhaps you could ftp that .covinfo file along with the fasta file
> > >> (compressed!) toftp://insilicos.serveftp.net.pubandIcan look for  
> > >> clues.
>
> > >> Brian Pratt
> > >> Insilicos LLC
>
> > >> -----Original Message-----
> > >> From: spctools-discuss@googlegroups.com
>
> > >> [mailto:spctools-disc...@googlegroups.com] On Behalf Of bill
> > >> Sent: Thursday, March 12, 2009 1:06 PM
> > >> To: spctools-discuss
> > >> Subject: [spctools-discuss] Re: protein prophet can't find protein in
> > >> database
>
> > >> I can't see anything weird with the pep.xml or the fasta entries but
> > >> it always fails at the same protein for a run (different protein on
> > >> different runs). Below are clips from the pep.xml and the fasta of  
> > >> the
> > >> entries (and the two flanking entries)  for a failing accession
> > >> number, 85089826. I used vi with the :set list option so hidden
> > >> characters are shown.
>
> > >> One clue is there is a temporary file created named *.pep-
> > >> prot.xml.covinfo. The failing accession number is the first entry in
> > >> this file.
>
> > >> Thanks for any ideas or suggestions of where to check next.
> > >> Bill
>
> > >>> gi|186687019|ref|YP_001870408.1| hypothetical protein Npun_DR020  
> > >>> [Nostoc
> > >> punctiforme PCC 73102]^Agi|186469643|gb|ACC85440.1| conserved  
> > >> hypothetical
> > >> protein [Nostoc punctiforme PCC 73102]$
>
> MARNYGAAKLKNHIHNNAPGTLGAMGRLFDVSESDFNKALLGDQTVITKIADMARLSDTAKANLPKALEVYRKII
>  
> > >> E
> > >> TTGD
> > >> $
>
> INQAYAELVQLTQKHGTQTLKAINTSKTGEQRFKNEMTEMQAEHVNATTAEATRHAQRSSLIQISGATADLMAIA
>  
> > >> K
> > >> YEAD
> > >> $
> > >> LIKASNKVPEAQDAADRAYETAVTSALWTNGSEAKTDRIPKPNYSRTAGISRVGQWFRNFMGI$
> > >>> gi|85089826|ref|XP_958128.1| hypothetical protein NCU05933  
> > >>> [Neurospora
> > >> crassa OR74A]^Agi|28919453|gb|EAA28892.1| predicted protein  
> > >> [Neurospora
> > >> crassa OR74A]$
>
> MAPATMTLPSQPFTCLLGQVKSPRHGPSGSPPAEPVTALFDNINTHLMMDSDSALDSDVEVDVVIVGAGLSGLRA
>  
> > >> A
> > >> VELH
> > >> $
>
> KAGLAIAVLEESNRVGGQCCTTFARHTTRITSIGDTHTEMLSLAKDCKIDLTKQHKEGLDLQQRYDGFQDSPQVS
>  
> > >> E
> > >> TIPA
> > >> $
>
> LETAVWPPQEYQLSFRSLIEDPAIQKFYRRISDLSETWHGPDALFLDAVSFLELVEANFFHSKAVQQEAHFLTRY
>  
> > >> L
> > >> LSVD
> > >> $
>
> PARVSALYVLDHIASGGGLANLYFHPDSPGGGAHHFRVPQGPRAFITKLVDLLPEGCIHLSTMATKITQQTTPIS
>  
> > >> K
> > >> IYYP
> > >> $
>
> SHPCKVETSPPVDSDSTASTTASTTTKTFYAKKVLLATPPALYSPCWSHLPEVLTFHPPLPPHKLASINRHSGEY
>  
> > >> N
> > >> NYGI
> > >> $
>
> FTTVTFYFSQPWWREAGLSGSMDCRVTRDLDGPISWVRDTSEDKGGGKINTNKKVWSLTCWCAGENAYEVWKWYD
>  
> > >> K
> > >> TYPV
> > >> $
>
> DDEKPEEIKGGWKASPVWMHLLRVFKERMDALEMEIPLPGEGFSDSEEEEEKENEMEEGAEPPSVYLFRKWTGLC
>  
> > >> F
> > >> TPPR
> > >> $
>
> ALPSNGEELEELLRTWKYDGPGKGDATFREPFSNVHFAGTETAEEWRGFMEGAARSGLRGAKEIIQALEKGNLKD
>  
> > >> G
> > >> EGGK
> > >> $
> > >> NKMLKALL$
> > >>> gi|186687127|ref|YP_001870270.1| helicase domain-containing  
> > >>> protein [Nostoc
> > >> punctiforme PCC 73102]^Agi|186469430|gb|ACC85229.1| helicase domain  
> > >> protein
> > >> [Nostoc punctiforme PCC 73102]$
>
> MAQTVNHSPGSIVTCRSRQWVILPSENQDVIRLRPLSGNEDEIAGIYQKLLEEELEKIESATFPLPQATSVQDHA
>  
> > >> A
> > >> ALLL
> > >> $
>
> MDAARLLLRSGAGPFRCLGRLSLRPRPYQLVPLLMALKLETVKLLVADDVGIGKTIEAGLIARELLDRGEVKRIA
>  
> > >> V
> > >> LCPP
> > >> $
>
> HLCDQWQQELREKFHIDAVVVRSGTASKLERNIPNNDSVFSYYRHLIVSLDYAKAERRRASFITHCPDLVIVDEA
>  
> > >> H
> > >> TCAR
> > >> $
>
> PNKTTTSQQQRHQLITEIAQKQEQHLLLLTATPHSGIEESFLSLLGLLKPEFEHFNLNSLTDKQRDHLANHFVQR
>  
> > >> R
> > >> RADV
> > >> $
>
> KLWLGNETPFPERESSEESYKLSKEYKELFDEVYDFARGLVKTTTADMSHAQRRGRYWSALALIRCVMSSPAAAI
>  
> > >> A
> > >> TLNR
> > >> $
>
> QVSKSGGSLTDLDEDLMSSYVHDPTEQEQAVDASPTVVIEQGQQSYKDADKRKLKAFVQSAEKLQGGKDQKLQSC
>  
> > >> I
> > >> ATVE
> > >> $
>
> SLLKDEMNPIVWCRYIATANYVADALRQKLQKKGSQIRVIAITGELSEDEREIRLEELKSYPQRVLVATDCLSEG
>  
> > >> V
> > >> NLQT
> > >> $
>
> HFSAVIHYDLPWNPNRLEQREGRIDRYGQTATKVKACLLYGRDNPVDGAVLDVLIRKAVQIHKSLGITVPVPMES
>  
> > >> T
> > >> TVAE
> > >> $
>
> AVFKSLFERTTEVIQLSLFDFQEESAVDKVHKNWDNAVEREKTNRTRFAQRAIKPEQVEQELIDSDQILGNEQDV
>  
> > >> E
> > >> RFVI
> > >> $
>
> SACDRISCYLIKKKQGWLLPQPPDFLKSTLGDKSRLLTFTTPAPEGVEYVGRNHPLVEGLAQYILEDALSLAVEP
>  
> > >> I
> > >> AARC
> > >> $
>
> GFTTTNAVQKRTTLLLVRLRHLLDSSRRTTETKNTTLLAEECAVIGFTGSPSSPNWLPQLEATRVLQEAKPVSDA
>  
> > >> G
> > >> KAIK
> > >> $
> > >> QGEIAELLPRLEELQPDLEKFAGQRAEELLQSHKRVRDITKEGRIRVTPQLPMDVLGVFILQPGRK$
>
> > >> <spectrum_query spectrum="UPS1-01.01153.01153.1" start_scan="1153"
> > >> end_scan="1153" precursor_neutral_mass="686.594836"  
> > >> assumed_charge="1"
> > >> index="105" retention_time_sec="2777.92">$
> > >> <search_result>$
> > >> <search_hit hit_rank="1" peptide="MLVIVI" peptide_prev_aa="K"
> > >> peptide_next_aa="-" protein="gi|220838530|emb|CAX15281.1|"
> > >> protein_descr="syntaxin 16 [Mus musculus]" num_tot_proteins="1"
> > >> num_matched_ions="7" tot_num_ions="10"
> > >> calc_neutral_pep_mass="686.439724" massdiff="0.155112"
> > >> num_tol_term="2" num_missed_cleavages="0" is_rejected="0">$
> > >> <search_score name="hyperscore" value="20.2"/>$
> > >> <search_score name="nextscore" value="19.4"/>$
> > >> <search_score name="bscore" value="10.4"/>$
> > >> <search_score name="yscore" value="10.4"/>$
> > >> <search_score name="expect" value="2.2"/>$
> > >> <analysis_result analysis="peptideprophet">$
> > >> <peptideprophet_result probability="0.9991"
> > >> all_ntt_prob="(0.0000,0.0000,0.9991)">$
> > >> <search_score_summary>$
> > >> <parameter name="fval" value="0.4444"/>$
> > >> <parameter name="ntt" value="2"/>$
> > >> <parameter name="nmc" value="0"/>$
> > >> <parameter name="massd" value="0.155"/>$
> > >> </search_score_summary>$
> > >> </peptideprophet_result>$
> > >> </analysis_result>$
> > >> </search_hit>$
> > >> </search_result>$
> > >> </spectrum_query>$
> > >> <spectrum_query spectrum="UPS1-01.01154.01154.1" start_scan="1154"
> > >> end_scan="1154" precursor_neutral_mass="688.567614"  
> > >> assumed_charge="1"
> > >> index="106" retention_time_sec="2781.69">$
> > >> <search_result>$
> > >> <search_hit hit_rank="1" peptide="MLKALL" peptide_prev_aa="K"
> > >> peptide_next_aa="-" protein="gi|85089826|ref|XP_958128.1|"
> > >> protein_descr="hypothetical protein NCU05933 [Neurospora crassa
> > >> OR74A]" num_tot_proteins="1" num_matched_ions="6" tot_num_ions="10"
> > >> calc_neutral_pep_mass="687.435724" massdiff="1.131890"
> > >> num_tol_term="2" num_missed_cleavages="1" is_rejected="0">$
> > >> <search_score name="hyperscore" value="17.3"/>$
> > >> <search_score name="nextscore" value="15.4"/>$
> > >> <search_score name="bscore" value="10.1"/>$
> > >> <search_score name="yscore" value="9.7"/>$
> > >> <search_score name="expect" value="8.2"/>$
> > >> <analysis_result analysis="peptideprophet">$
> > >> <peptideprophet_result probability="0.9938"
> > >> all_ntt_prob="(0.0000,0.0000,0.9938)">$
> > >> <search_score_summary>$
> > >> <parameter name="fval" value="0.0137"/>$
> > >> <parameter name="ntt" value="2"/>$
> > >> <parameter name="nmc" value="1"/>$
> > >> <parameter name="massd" value="1.132"/>$
> > >> </search_score_summary>$
> > >> </peptideprophet_result>$
> > >> </analysis_result>$
> > >> </search_hit>$
> > >> </search_result>$
> > >> </spectrum_query>$
> > >> <spectrum_query spectrum="UPS1-01.01161.01161.1" start_scan="1161"
> > >> end_scan="1161" precursor_neutral_mass="1077.998217"
> > >> assumed_charge="1" index="107" retention_time_sec="2803.42">$
> > >> <search_result>$
> > >> <search_hit hit_rank="1" peptide="VLDYLEAGAK" peptide_prev_aa="K"
> > >> peptide_next_aa="A" protein="gi|55980906|ref|YP_144203.1|"
> > >> protein_descr="hypothetical protein TTHA0937 [Thermus thermophilus
> > >> HB8]" num_tot_proteins="2" num_matched_ions="9" tot_num_ions="18"
> > >> calc_neutral_pep_mass="1077.570724" massdiff="0.427493"
> > >> num_tol_term="2" num_missed_cleavages="0" is_rejected="0">$
> > >> <alternative_protein protein="gi|46198875|ref|YP_004542.1|"
> > >> protein_descr="putative cytoplasmic protein [Thermus thermophilus
> > >> HB27]" num_tol_term="2"/>$
> > >> <search_score name="hyperscore" value="26.0"/>$
> > >> <search_score name="nextscore" value="25.5"/>$
> > >> <search_score name="bscore" value="9.9"/>$
> > >> <search_score name="yscore" value="10.6"/>$
> > >> <search_score name="expect" value="8.2"/>$
> > >> <analysis_result analysis="peptideprophet">$
>
> > >> On Mar 10, 3:43 pm, "Brian Pratt" <brian.pr...@insilicos.com> wrote:
> > >>> My guess would be it's something in how the database is formatted,  
> > >>> rather
> > >>> than its size - or possibly something to do with that ">"  
> > >>> character which
> > >> is
> > >>> causing unhappiness in XML (where it's a reserved character).
>
> > >>> -----Original Message-----
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to spctools-discuss@googlegroups.com
To unsubscribe from this group, send email to 
spctools-discuss+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
[spctools-discuss] Re: protein prophet can't find protein in database

Reply via email to