Yes, one would want to escape everything properly - happily there's a library call for that. And certainly it's only right to emit valid XML.
But I do think that it might be wisest to sidestep the whole mess - it's valid FASTA but also unconventional (based on many years of TPP not bumping into this), and even converted to valid XML I suspect it may cause other problems downstream since it no longer exactly matches the FASTA. I suspect you're damned if you do and damned if you don't. Brian On Wed, Nov 11, 2009 at 11:25 AM, Matthew Chambers < [email protected]> wrote: > > What about the other reserved characters in XML that are valid in FASTA? > " > ' > & > > Not escaping could also break downstream software - especially with & > which should always begin an escape sequence. :( > > -Matt > > > Brian Pratt wrote: > > Granted, this is a defect - but that's still an unfortunate choice of > > characters. Even with the correction I can imagine this tripping > > up other software downstream since the properly escaped XML would no > > longer match the FASTA on a literal basis. I don't suppose your > > users could be induced to use { and } or [ and ] or ( and ) instead of > > < and > ? > > > > Brian > > > > On Tue, Nov 10, 2009 at 9:43 PM, Simon Michnowicz > > <[email protected] <mailto:[email protected]>> wrote: > > > > Dear Group, > > > > I would like to flag a possible bug in a TPP tool.(Sorry in > > advance if this is the wrong forum to report bugs). > > > > One of our users has reported issues with a tpp pepXML tool (he > > was using Mascot so I assume he was using Mascot2XML.exe). > > > > Our FASTA database has protein entries with special characters in > > then, i.e. > > > > *IFN-<alpha>2* > > > > *&* > > > > *V<beta>14 * > > > > This generated a pepXML file that was not valid xml, as the tags > > were not escaped properly. > > > > > > > > > > *<alternative_protein protein="tr|Q9UMA4|IFN-<alpha>2" > > num_tol_term="2" peptide_prev_aa="-" peptide_next_aa="S"/>* > > > > > > > > > > regards > > > > > > > > > > Simon Michnowicz > > Duty Programmer > > Australian Proteomics Computation Facility > > Ludwig Institute For Cancer Research > > Royal Melbourne Hospital, > > Victoria > > Tel: (+61 3) 9341 3155 > > Fax: (+61 3) 9341 3104 > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
