I've written some scripts to perform various sorts of comparisons directly 
on the pep.xml files without recourse to other tools. If there's any 
interest in these please let me know and I can make them more user 
friendly. 
BW,
Alastair

Am Freitag, 20. Juli 2018 17:28:52 UTC+2 schrieb 
alastair.s...@googlemail.com:
>
> Hello,
>
> I'm working with some large protein databases derived from PacBio 
> sequencing projects which contain quite a lot of redundancy (different 
> isoforms and variants of the same gene / protein). I've noticed that the 
> protein level identifications I get are heavily dependent on the search 
> engine or combination of search engines I use. I concerns me, for example, 
> when a protein identified rather confidently (eg 10's of PSMs, >5 peptides) 
> with one search engine doesn't appear in the list at all with another 
> search engine - even as a subset protein in a group. 
>
> To understand better how the particular nature of these databases is 
> affecting the protein identifications I want to compare the search results 
> in various ways. Before I invest time writing my own solutions, I wanted to 
> know if anyone knows of any software that can compute metrics such as the 
> following when comparing search results:
>
> 1) For each spectrum in the PSM searches, how many of the top N PSMs are 
> shared between the searches?
> 2) Proportion of all lead proteins (top protein from a group) that are the 
> same
> 3) Proportion of IDs that are the same when considering all group members
> 4) More complex measures of group similarity: eg. Proportion of groups 
> where >X% of members are within a group together in the other search, and 
> how many of these equivalent groups contain the lead protein from the other 
> group?
> 5) Other things you can think of that might be useful
>
> I saw one tool called compid, (
> https://pubs.acs.org/doi/pdf/10.1021/pr100824w) but it only works with 
> MASCOT and PARAGON results and the download is broken anyway.
>
> Are there any other solutions around?
>
> To work with the data myself I'd need to convert the pep.xml and prot.xml 
> data to tsv format. Obviously Petunia does this beautifully, but I'd rather 
> be able to process multiple files on the linux command line if possible. I 
> saw an old post in this group about this issue and they suggested RpepXML 
> (I don't know if it will work for prot.xml) or tppXMLparser (which isn't 
> flexible in terms of the fields it can output). Are these the most current 
> solutions to the problem or is there something more recent that I've missed?
>
> Any ideas / thoughts / help would be very much appreciated!!
>
> Thanks,
>
> Alastair
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to spctools-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to spctools-discuss@googlegroups.com.
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Reply via email to