Hello again, I realized you sent me the db and each protein listed only once, so the problem is with the Phenyx pepXML dumper writing these incorrect tags. Temporarily, I have removed these entries using the sed command: sed -i 's/^<alternative_protein.*$//g'
Then I processed the file with PeptideProphetParser and options MINPROB=0 DECOYPROBS NONPARAM DECOY=decoy . Your PeptideProphet processed file is posted here: http://groups.google.com/group/spctools-discuss/web/interact.pep.xml4Ira For the time being, please run RefreshParser as a separate step to map the IDed peptides to the proteins in the database. My code changes will be committed later today. -David On Wed, Nov 10, 2010 at 12:35 PM, David Shteynberg <[email protected]> wrote: > Hi Ira, > > Thanks for providing your Phenyx search results. I was able to > resolve a bug in PeptideProphet, which was preventing it from reading > in any Phenyx search results. At this point I still cannot process > these files with PeptideProphet because of the alternative_protein > entries. Somehow, the Phenyx pepXML exporter is writing each protein > twice (once listing it as an alternative protein), as follows: > > <search_hit hit_rank="1" peptide="QVFKQFENYVMQFNFPEEEYIDNLHK" > peptide_prev_aa="R" peptide_next_aa="M" protein="decoy_12505" > num_tot_proteins="1" num_matched_ions="3" tot_num_ions="153" > calc_neutral_pep_mass="3335.559715" massdiff= > "0.1563615" num_tol_term="2" num_missed_cleavages="1" is_rejected="0"> > <modification_info modified_peptide="QVFKQFENYVMQFNFPEEEYIDNLHK"> > </modification_info> > <search_score name="zscore" value="3.98486"/> > <search_score name="zvalue" value="3.375993e-05"/> > <search_score name="origScore" value="-30.25199"/> > <alternative_protein protein="sp_human%decoy_12505"/> > </search_hit> > > Do you really have each protein listed twice in the database or is > this pointing to a bug in the Phenyx pepXML dump? As far as > PeptideProphet is concerned the second entry <alternative_protein > protein="sp_human%decoy_12505"/> is not a DECOY since it doesn't begin > with "decoy", and PeptideProphet doesn't consider any hit a decoy > when atleast one of the matched proteins in the DB (that contains that > peptide sequence) is not a DECOY. > > I think if this second issue can be resolved I will be able to compile > a binary PeptideProphetParser that will not choke on this data. > > Cheers, > -David > > > > > > On Tue, Nov 9, 2010 at 6:31 PM, ira cooke <[email protected]> wrote: >> Hi David, >> I've uploaded the database and original file .. the file is called >> iracooke_phenyx_pepXML.tar.gz >> >> The files I've uploaded are all unmodified output from the search >> (performed on a different machine to where the TPP is installed). I'd >> previously tried changing some of the paths in the files to try and >> fix them, but had no success. >> >> Also, the output of the commands I'm running is; >> >> /usr/local/tpp-4-4-0/bin/xinteract -Ninteract.pep.xml -p0 -eS -l7 - >> D'sphuman_20101013_DECOY.fasta' -OdP -ddecoy 475_pepxml.xml >> >> /usr/local/tpp-4-4-0/bin/xinteract (TPP v4.4 VUVUZELA rev 0, Build >> 201010010955 (linux)) >> >> running: "/usr/local/tpp-4-4-0/bin/InteractParser 'interact.pep.xml' >> '475_pepxml.xml' -D'sphuman_20101013_DECOY.fasta' -L'7' - >> E'stricttrypsin'" >> file 1: 475_pepxml.xml >> processed altogether 116 results >> >> >> results written to file /var/www/ISB/data/Projects/Test/ >> interact.pep.shtml >> >> >> >> command completed in 1 sec >> >> running: "/usr/local/tpp-4-4-0/bin/PeptideProphetParser >> 'interact.pep.xml' MINPROB=0 DECOYPROBS NONPARAM DECOY=decoy" >> Using Decoy Label "decoy". >> Decoy Probabilities will be reported. >> Using non-parametric distributions >> (PHENYX) (minprob 0) >> WARNING!! The discriminant function for Phenyx is not yet complete. >> It is presented here to help facilitate trial and discussion. >> Reliance on this code for publishable scientific results is not >> recommended. >> init with PHENYX stricttrypsin >> MS Instrument info: Manufacturer: ThermoFinnigan, Model: default, >> Ionization: FIXME, Analyzer: FIXME, Detector: FIXME >> >> PeptideProphet (TPP v4.4 VUVUZELA rev 0, Build 201010010955 (linux)) >> akel...@isb >> read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. >> read in no data >> >> command "/usr/local/tpp-4-4-0/bin/PeptideProphetParser >> 'interact.pep.xml' MINPROB=0 DECOYPROBS NONPARAM DECOY=decoy" exited >> with non-zero exit code: 256 >> QUIT - the job is incomplete >> >> >> Thanks for your help >> Ira >> >> >> >> On Nov 10, 3:08 am, David Shteynberg <[email protected]> >> wrote: >>> It would be helpful to see all of the output from the latest command. >>> You might have to unravel the pipeline and run the steps separately if >>> the Phenyx pepXML is missing some information. Would it be possible >>> for you post you Phenyx pepXML file and the database so I can try it >>> in a debugger? >>> >>> Thanks, >>> -David >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Nov 8, 2010 at 6:19 PM, ira cooke <[email protected]> wrote: >>> > Thanks for your quick response. >>> > I've modified my command to >>> >>> > /usr/local/tpp-4-4-0/bin/xinteract -Ninteract.pep.xml -p0 -eT -l7 -D/ >>> > var/www/ISB/data/Databases/OnMascot/SPHuman/ >>> > sphuman_20101013_DECOY.fasta -OdP -ddecoy 206_pepxml.xml >>> >>> > (also tried without the -Od option). >>> >>> > Unfortunately I still get the same error. >>> > I've checked my 206_pepxml.xml file .. and the decoys are named as >>> > follows >>> >>> > protein="decoy_9817" >>> >>> > Is it possible that the error is related to not having the raw >>> > spectra? Or should I not worry about that? >>> >>> > Thanks for your help. >>> >>> > On Nov 9, 11:06 am, David Shteynberg <[email protected]> >>> > wrote: >>> >> Phenyx search results can only be processed with the semi-parametric >>> >> modeling based on decoys. Judging from name of your database, it does >>> >> have decoys in there. Now you must tell PeptideProphet to use the >>> >> semi-parametric model with xinteract option -OP and the decoy tag that >>> >> all your decoy proteins begin with using xinteract flag e.g. -dDECOY >>> >> if all your decoys proteins begin with DECOY. You can also use option >>> >> -Od to have PeptideProphet assign non-zero probabilities to the decoy >>> >> hits. >>> >>> >> -David >>> >>> >> On Mon, Nov 8, 2010 at 2:36 PM, ira cooke <[email protected]> >>> >> wrote: >>> >> > Hi, >>> >>> >> > I've been struggling to run PeptideProphet on phenyx generated pepXML >>> >> > files. >>> >>> >> > The error I get from PeptideProphet is "read in no data". Full output >>> >> > from the tool is as follows; >>> >>> >> > ------------------ <BEGIN COMMANDLINE OUTPUT>----------------- >>> >> > /usr/local/tpp-4-4-0/bin/xinteract -Ninteract.pep.xml -p0 -eT -l7 -D/ >>> >> > var/www/ISB/data/Databases/OnMascot/SPHuman/ >>> >> > sphuman_20101013_DECOY.fasta 206_pepxml.xml >>> >>> >> > /usr/local/tpp-4-4-0/bin/xinteract (TPP v4.4 VUVUZELA rev 0, Build >>> >> > 201010010955 (linux)) >>> >>> >> > running: "/usr/local/tpp-4-4-0/bin/InteractParser 'interact.pep.xml' >>> >> > '206_pepxml.xml' -D'/var/www/ISB/data/Databases/OnMascot/SPHuman/ >>> >> > sphuman_20101013_DECOY.fasta' -L'7' -E'trypsin'" >>> >> > file 1: 206_pepxml.xml >>> >> > processed altogether 182 results >>> >>> >> > results written to file /var/www/ISB/data/Projects/TRegs/SP/Phenyx/ >>> >> > interact.pep.shtml >>> >>> >> > command completed in 3 sec >>> >>> >> > running: "/usr/local/tpp-4-4-0/bin/PeptideProphetParser >>> >> > 'interact.pep.xml' MINPROB=0" >>> >> > (PHENYX) (minprob 0) >>> >> > WARNING!! The discriminant function for Phenyx is not yet complete. >>> >> > It is presented here to help facilitate trial and discussion. >>> >> > Reliance on this code for publishable scientific results is not >>> >> > recommended. >>> >> > init with PHENYX trypsin >>> >> > MS Instrument info: Manufacturer: ThermoFinnigan, Model: default, >>> >> > Ionization: FIXME, Analyzer: FIXME, Detector: FIXME >>> >>> >> > PeptideProphet (TPP v4.4 VUVUZELA rev 0, Build 201010010955 (linux)) >>> >> > akel...@isb >>> >> > read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra. >>> >> > read in no data >>> >>> >> > command "/usr/local/tpp-4-4-0/bin/PeptideProphetParser >>> >> > 'interact.pep.xml' MINPROB=0" exited with non-zero exit code: 256 >>> >> > QUIT - the job is incomplete >>> >>> >> > ------------------ <END COMMANDLINE OUTPUT>----------------- >>> >>> >> > Note that without the -eT option I get a crash (segmentation fault). >>> >> > The enzyme specified inside the phenyx pepXML is >>> >>> >> > <sample_enzyme name="Trypsin_(KR_noP)"> >>> >> > </sample_enzyme> >>> >>> >> > (to be honest I'm not sure if this means trypsin or stricttrypsin ... >>> >> > but that's probably another issue as I get the error with -eS option >>> >> > as well). >>> >>> >> > Could this error be caused by a lack of raw data files? I ran the >>> >> > phenyx searches on another computer and the file contains paths to raw >>> >> > data on that computer. I did try fixing the paths (and copying the >>> >> > raw data to a TPP accessible location) ... but that didn't work >>> >> > either. >>> >>> >> > I guess my question is what does "read in no data" mean? Does data >>> >> > refer to the original spectra, or does it refer to something in the >>> >> > output of InteractParser (ie interact.pep.xml). >>> >>> >> > Any help at all on this issue would be much appreciated >>> >>> >> > Thanks >>> >>> >> > -- >>> >> > You received this message because you are subscribed to the Google >>> >> > Groups "spctools-discuss" group. >>> >> > To post to this group, send email to [email protected]. >>> >> > To unsubscribe from this group, send email to >>> >> > [email protected]. >>> >> > For more options, visit this group >>> >> > athttp://groups.google.com/group/spctools-discuss?hl=en. >>> >>> > -- >>> > You received this message because you are subscribed to the Google Groups >>> > "spctools-discuss" group. >>> > To post to this group, send email to [email protected]. >>> > To unsubscribe from this group, send email to >>> > [email protected]. >>> > For more options, visit this group >>> > athttp://groups.google.com/group/spctools-discuss?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "spctools-discuss" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/spctools-discuss?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
