I'm working with some users who are generating a malformed pepXML file by
using Mascot2XML when the protein description includes special XML
characters, like < and >. The output pepXML file includes this text:
...
<alternative_protein protein="tr|A0N5G5|A0N5G5_HUMAN"
protein_descr="Rheumatoid factor D5 light chain (Fragment) OS=Homo sapiens
GN=V<kappa>3 PE=2 SV=1" num_tol_term="2" peptide_prev_aa="R"
peptide_next_aa="A"/>
<kappa>
<kappa>
<search_score name="ionscore" value="46.68"/>
...
Note the "<kappa>" that's included, unencoded in the value of
the protein_descr attribute.
I'm attaching a patch, which uses the same encoding approach that's being
used for the primary protein identification as of 2010 with revision 4877.
However, I'm also worried about the unclosed <kappa> tags immediately
afterwards. I assume that's coming from the modtags code a little later in
that same function. However, after a little poking around, I've been unable
to find the implementation of writeTraditional(). Anyone have ideas on what
might be going wrong here?
Thanks,
Josh
--
You received this message because you are subscribed to the Google Groups
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.
Index: MascotConverter.cxx
===================================================================
--- MascotConverter.cxx (revision 7405)
+++ MascotConverter.cxx (working copy)
@@ -3391,7 +3391,7 @@
if (generate_description_) {
proteinMap::const_iterator it = proteinDescriptionMap_.find (rank1_proteins_[rank1Index]);
if (it != proteinDescriptionMap_.end()) {
- fprintf(fout, " protein_descr=\"%s\"", it->second.c_str());
+ fprintf(fout, " protein_descr=\"%s\"", XMLEscape(it->second).c_str());
}
else {
fprintf(fout, " protein_descr=\"NON_EXISTENT PROTEIN DESCRIPTION\"");