Hi all (again),
I made some tests and I modified the code of "encode_for_xml" in the
following way and it seem to work fine:
def encode_for_xml(text, wash=False, xml_version='1.0', quote=False):
"""Encodes special characters in a text so that it would be
XML-compliant.
@param text: text to encode
@return: an encoded text"""
text = text.replace('&', '&')
text = text.replace('<', '<')
text = text.replace('>', '>')
if quote:
text = text.replace('"', '"')
if wash:
text = wash_for_xml(text, xml_version=xml_version)
return text
I repeat that I don't know why all the XML special characters are not
escaped, but even this solution looks semantically wrong to me,
because it doesn't follow the W3C guidelines:
http://www.w3.org/TR/xml/#syntax
A correct function should escape in this way:
" "
' '
< <
> >
& &
while a CDATA section should not be escaped,
but at least now the XML generated (and stored in bibfmt) is valid.
Thank for your help,
Giovanni
--------------------------------------------------------------
Giovanni Di Milia
IT Specialist at SAO/NASA ADS
Harvard-Smithsonian Center for Astrophysics
60 Garden Street, MS 83
Cambridge, MA 02138 USA
email: [email protected]
--------------------------------------------------------------
On Tue, Oct 30, 2012 at 2:44 PM, Giovanni Di Milia
<[email protected]> wrote:
> Hi all,
> here at ADS we have a problem with some metadata that contain CDATA elements.
> The problem is caused by the export procedure of Invenio that doesn't
> properly encode these elements.
>
> What happens is that all the elements like
> '<![CDATA[ foobar ]]>'
> are converted to
> '<![CDATA[ foobar ]]>'
> and this in XML is an error.
>
> After reading a very similar discussion from 2010 (started by Benoit),
> I suppose that the problem is still in
> invenio.textutils.encode_for_xml()
> which is used in
> bibformat_utils.record_get_xml().
>
> I honestly don't understand why all the tags inside a subflield are
> not escaped (but I suppose there is a good reason) but in case of
> CDATA the tag should be completely escaped.
>
> Thanks for your help,
>
> Giovanni
>
>
>
>
> --------------------------------------------------------------
> Giovanni Di Milia
> IT Specialist at SAO/NASA ADS
> Harvard-Smithsonian Center for Astrophysics
> 60 Garden Street, MS 83
> Cambridge, MA 02138 USA
> email: [email protected]
> --------------------------------------------------------------