I've written some scripts to perform various sorts of comparisons directly on the pep.xml files without recourse to other tools. If there's any interest in these please let me know and I can make them more user friendly. BW, Alastair
Am Freitag, 20. Juli 2018 17:28:52 UTC+2 schrieb alastair.s...@googlemail.com: > > Hello, > > I'm working with some large protein databases derived from PacBio > sequencing projects which contain quite a lot of redundancy (different > isoforms and variants of the same gene / protein). I've noticed that the > protein level identifications I get are heavily dependent on the search > engine or combination of search engines I use. I concerns me, for example, > when a protein identified rather confidently (eg 10's of PSMs, >5 peptides) > with one search engine doesn't appear in the list at all with another > search engine - even as a subset protein in a group. > > To understand better how the particular nature of these databases is > affecting the protein identifications I want to compare the search results > in various ways. Before I invest time writing my own solutions, I wanted to > know if anyone knows of any software that can compute metrics such as the > following when comparing search results: > > 1) For each spectrum in the PSM searches, how many of the top N PSMs are > shared between the searches? > 2) Proportion of all lead proteins (top protein from a group) that are the > same > 3) Proportion of IDs that are the same when considering all group members > 4) More complex measures of group similarity: eg. Proportion of groups > where >X% of members are within a group together in the other search, and > how many of these equivalent groups contain the lead protein from the other > group? > 5) Other things you can think of that might be useful > > I saw one tool called compid, ( > https://pubs.acs.org/doi/pdf/10.1021/pr100824w) but it only works with > MASCOT and PARAGON results and the download is broken anyway. > > Are there any other solutions around? > > To work with the data myself I'd need to convert the pep.xml and prot.xml > data to tsv format. Obviously Petunia does this beautifully, but I'd rather > be able to process multiple files on the linux command line if possible. I > saw an old post in this group about this issue and they suggested RpepXML > (I don't know if it will work for prot.xml) or tppXMLparser (which isn't > flexible in terms of the fields it can output). Are these the most current > solutions to the problem or is there something more recent that I've missed? > > Any ideas / thoughts / help would be very much appreciated!! > > Thanks, > > Alastair > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discuss+unsubscr...@googlegroups.com. To post to this group, send email to spctools-discuss@googlegroups.com. Visit this group at https://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/d/optout.