Hi Ira, Thanks for providing your Phenyx search results. I was able to resolve a bug in PeptideProphet, which was preventing it from reading in any Phenyx search results. At this point I still cannot process these files with PeptideProphet because of the alternative_protein entries. Somehow, the Phenyx pepXML exporter is writing each protein twice (once listing it as an alternative protein), as follows:
<search_hit hit_rank="1" peptide="QVFKQFENYVMQFNFPEEEYIDNLHK" peptide_prev_aa="R" peptide_next_aa="M" protein="decoy_12505" num_tot_proteins="1" num_matched_ions="3" tot_num_ions="153" calc_neutral_pep_mass="3335.559715" massdiff= "0.1563615" num_tol_term="2" num_missed_cleavages="1" is_rejected="0"> <modification_info modified_peptide="QVFKQFENYVMQFNFPEEEYIDNLHK"> </modification_info> <search_score name="zscore" value="3.98486"/> <search_score name="zvalue" value="3.375993e-05"/> <search_score name="origScore" value="-30.25199"/> <alternative_protein protein="sp_human%decoy_12505"/> </search_hit> Do you really have each protein listed twice in the database or is this pointing to a bug in the Phenyx pepXML dump? As far as PeptideProphet is concerned the second entry <alternative_protein protein="sp_human%decoy_12505"/> is not a DECOY since it doesn't begin with "decoy", and PeptideProphet doesn't consider any hit a decoy when atleast one of the matched proteins in the DB (that contains that peptide sequence) is not a DECOY. I think if this second issue can be resolved I will be able to compile a binary PeptideProphetParser that will not choke on this data. Cheers, -David On Tue, Nov 9, 2010 at 6:31 PM, ira cooke <iraco...@googlemail.com> wrote: > Hi David, > I've uploaded the database and original file .. the file is called > iracooke_phenyx_pepXML.tar.gz > > The files I've uploaded are all unmodified output from the search > (performed on a different machine to where the TPP is installed). I'd > previously tried changing some of the paths in the files to try and > fix them, but had no success. > > Also, the output of the commands I'm running is; > > /usr/local/tpp-4-4-0/bin/xinteract -Ninteract.pep.xml -p0 -eS -l7 - > D'sphuman_20101013_DECOY.fasta' -OdP -ddecoy 475_pepxml.xml > > /usr/local/tpp-4-4-0/bin/xinteract (TPP v4.4 VUVUZELA rev 0, Build > 201010010955 (linux)) > > running: "/usr/local/tpp-4-4-0/bin/InteractParser 'interact.pep.xml' > '475_pepxml.xml' -D'sphuman_20101013_DECOY.fasta' -L'7' - > E'stricttrypsin'" > file 1: 475_pepxml.xml > processed altogether 116 results > > > results written to file /var/www/ISB/data/Projects/Test/ > interact.pep.shtml > > > > command completed in 1 sec > > running: "/usr/local/tpp-4-4-0/bin/PeptideProphetParser > 'interact.pep.xml' MINPROB=0 DECOYPROBS NONPARAM DECOY=decoy" > Using Decoy Label "decoy". > Decoy Probabilities will be reported. > Using non-parametric distributions > (PHENYX) (minprob 0) > WARNING!! The discriminant function for Phenyx is not yet complete. > It is presented here to help facilitate trial and discussion. > Reliance on this code for publishable scientific results is not > recommended. > init with PHENYX stricttrypsin > MS Instrument info: Manufacturer: ThermoFinnigan, Model: default, > Ionization: FIXME, Analyzer: FIXME, Detector: FIXME > > PeptideProphet (TPP v4.4 VUVUZELA rev 0, Build 201010010955 (linux)) > akel...@isb > read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. > read in no data > > command "/usr/local/tpp-4-4-0/bin/PeptideProphetParser > 'interact.pep.xml' MINPROB=0 DECOYPROBS NONPARAM DECOY=decoy" exited > with non-zero exit code: 256 > QUIT - the job is incomplete > > > Thanks for your help > Ira > > > > On Nov 10, 3:08 am, David Shteynberg <dshteynb...@systemsbiology.org> > wrote: >> It would be helpful to see all of the output from the latest command. >> You might have to unravel the pipeline and run the steps separately if >> the Phenyx pepXML is missing some information. Would it be possible >> for you post you Phenyx pepXML file and the database so I can try it >> in a debugger? >> >> Thanks, >> -David >> >> >> >> >> >> >> >> On Mon, Nov 8, 2010 at 6:19 PM, ira cooke <iraco...@googlemail.com> wrote: >> > Thanks for your quick response. >> > I've modified my command to >> >> > /usr/local/tpp-4-4-0/bin/xinteract -Ninteract.pep.xml -p0 -eT -l7 -D/ >> > var/www/ISB/data/Databases/OnMascot/SPHuman/ >> > sphuman_20101013_DECOY.fasta -OdP -ddecoy 206_pepxml.xml >> >> > (also tried without the -Od option). >> >> > Unfortunately I still get the same error. >> > I've checked my 206_pepxml.xml file .. and the decoys are named as >> > follows >> >> > protein="decoy_9817" >> >> > Is it possible that the error is related to not having the raw >> > spectra? Or should I not worry about that? >> >> > Thanks for your help. >> >> > On Nov 9, 11:06 am, David Shteynberg <dshteynb...@systemsbiology.org> >> > wrote: >> >> Phenyx search results can only be processed with the semi-parametric >> >> modeling based on decoys. Judging from name of your database, it does >> >> have decoys in there. Now you must tell PeptideProphet to use the >> >> semi-parametric model with xinteract option -OP and the decoy tag that >> >> all your decoy proteins begin with using xinteract flag e.g. -dDECOY >> >> if all your decoys proteins begin with DECOY. You can also use option >> >> -Od to have PeptideProphet assign non-zero probabilities to the decoy >> >> hits. >> >> >> -David >> >> >> On Mon, Nov 8, 2010 at 2:36 PM, ira cooke <iraco...@googlemail.com> wrote: >> >> > Hi, >> >> >> > I've been struggling to run PeptideProphet on phenyx generated pepXML >> >> > files. >> >> >> > The error I get from PeptideProphet is "read in no data". Full output >> >> > from the tool is as follows; >> >> >> > ------------------ <BEGIN COMMANDLINE OUTPUT>----------------- >> >> > /usr/local/tpp-4-4-0/bin/xinteract -Ninteract.pep.xml -p0 -eT -l7 -D/ >> >> > var/www/ISB/data/Databases/OnMascot/SPHuman/ >> >> > sphuman_20101013_DECOY.fasta 206_pepxml.xml >> >> >> > /usr/local/tpp-4-4-0/bin/xinteract (TPP v4.4 VUVUZELA rev 0, Build >> >> > 201010010955 (linux)) >> >> >> > running: "/usr/local/tpp-4-4-0/bin/InteractParser 'interact.pep.xml' >> >> > '206_pepxml.xml' -D'/var/www/ISB/data/Databases/OnMascot/SPHuman/ >> >> > sphuman_20101013_DECOY.fasta' -L'7' -E'trypsin'" >> >> > file 1: 206_pepxml.xml >> >> > processed altogether 182 results >> >> >> > results written to file /var/www/ISB/data/Projects/TRegs/SP/Phenyx/ >> >> > interact.pep.shtml >> >> >> > command completed in 3 sec >> >> >> > running: "/usr/local/tpp-4-4-0/bin/PeptideProphetParser >> >> > 'interact.pep.xml' MINPROB=0" >> >> > (PHENYX) (minprob 0) >> >> > WARNING!! The discriminant function for Phenyx is not yet complete. >> >> > It is presented here to help facilitate trial and discussion. >> >> > Reliance on this code for publishable scientific results is not >> >> > recommended. >> >> > init with PHENYX trypsin >> >> > MS Instrument info: Manufacturer: ThermoFinnigan, Model: default, >> >> > Ionization: FIXME, Analyzer: FIXME, Detector: FIXME >> >> >> > PeptideProphet (TPP v4.4 VUVUZELA rev 0, Build 201010010955 (linux)) >> >> > akel...@isb >> >> > read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. >> >> > read in no data >> >> >> > command "/usr/local/tpp-4-4-0/bin/PeptideProphetParser >> >> > 'interact.pep.xml' MINPROB=0" exited with non-zero exit code: 256 >> >> > QUIT - the job is incomplete >> >> >> > ------------------ <END COMMANDLINE OUTPUT>----------------- >> >> >> > Note that without the -eT option I get a crash (segmentation fault). >> >> > The enzyme specified inside the phenyx pepXML is >> >> >> > <sample_enzyme name="Trypsin_(KR_noP)"> >> >> > </sample_enzyme> >> >> >> > (to be honest I'm not sure if this means trypsin or stricttrypsin ... >> >> > but that's probably another issue as I get the error with -eS option >> >> > as well). >> >> >> > Could this error be caused by a lack of raw data files? I ran the >> >> > phenyx searches on another computer and the file contains paths to raw >> >> > data on that computer. I did try fixing the paths (and copying the >> >> > raw data to a TPP accessible location) ... but that didn't work >> >> > either. >> >> >> > I guess my question is what does "read in no data" mean? Does data >> >> > refer to the original spectra, or does it refer to something in the >> >> > output of InteractParser (ie interact.pep.xml). >> >> >> > Any help at all on this issue would be much appreciated >> >> >> > Thanks >> >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> > Groups "spctools-discuss" group. >> >> > To post to this group, send email to spctools-disc...@googlegroups.com. >> >> > To unsubscribe from this group, send email to >> >> > spctools-discuss+unsubscr...@googlegroups.com. >> >> > For more options, visit this group >> >> > athttp://groups.google.com/group/spctools-discuss?hl=en. >> >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "spctools-discuss" group. >> > To post to this group, send email to spctools-disc...@googlegroups.com. >> > To unsubscribe from this group, send email to >> > spctools-discuss+unsubscr...@googlegroups.com. >> > For more options, visit this group >> > athttp://groups.google.com/group/spctools-discuss?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To post to this group, send email to spctools-disc...@googlegroups.com. > To unsubscribe from this group, send email to > spctools-discuss+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. > > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to spctools-disc...@googlegroups.com. To unsubscribe from this group, send email to spctools-discuss+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.