Yes, one would want to escape everything properly - happily there's a
library call for that.  And certainly it's only right to emit valid XML.

But I do think that it might be wisest to sidestep the whole mess - it's
valid FASTA but also unconventional (based on many years of TPP not bumping
into this), and even converted to valid XML I suspect it may cause other
problems downstream since it no longer exactly matches the FASTA.  I suspect
you're damned if you do and damned if you don't.

Brian
On Wed, Nov 11, 2009 at 11:25 AM, Matthew Chambers <
[email protected]> wrote:

>
> What about the other reserved characters in XML that are valid in FASTA?
> "
> '
> &
>
> Not escaping could also break downstream software - especially with &
> which should always begin an escape sequence. :(
>
> -Matt
>
>
> Brian Pratt wrote:
> > Granted, this is a defect - but that's still an unfortunate choice of
> > characters.  Even with the correction I can imagine this tripping
> > up other software downstream since the properly escaped XML would no
> > longer match the FASTA on a literal basis.  I don't suppose your
> > users could be induced to use { and } or [ and ] or ( and ) instead of
> > < and > ?
> >
> > Brian
> >
> > On Tue, Nov 10, 2009 at 9:43 PM, Simon Michnowicz
>  > <[email protected] <mailto:[email protected]>> wrote:
> >
> >     Dear Group,
> >
> >     I would like to flag a possible bug in a TPP tool.(Sorry in
> >     advance if this is the wrong forum to report bugs).
> >
> >     One of our users has reported issues with a tpp pepXML tool (he
> >     was using Mascot so I assume he was using Mascot2XML.exe).
> >
> >     Our  FASTA database has protein entries with special characters in
> >     then, i.e.
> >
> >     *IFN-<alpha>2*
> >
> >     *&*
> >
> >     *V<beta>14 *
> >
> >     This generated a pepXML file that was not valid xml, as the tags
> >     were not escaped properly.
> >
> >
> >
> >
> >     *<alternative_protein protein="tr|Q9UMA4|IFN-<alpha>2"
> >     num_tol_term="2" peptide_prev_aa="-" peptide_next_aa="S"/>*
> >
> >
> >
> >
> >     regards
> >
> >
> >
> >
> >     Simon Michnowicz
> >     Duty Programmer
> >     Australian Proteomics Computation Facility
> >     Ludwig Institute For Cancer Research
> >     Royal Melbourne Hospital,
> >     Victoria
> >     Tel: (+61 3) 9341 3155
> >     Fax: (+61 3) 9341 3104
> >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to