I believe this is related to the problem I reported several days ago
(regarding the mechanism by which XXE identifies and loads XML files
with no associated DTD), but I wanted to make sure that the problem had
been identified. I discovered that with such documents (DTD-less
DocBook files), if they contained the the 0xa0 (non-breaking-space)
character, it would be translated automatically to the entity
when the file is saved. However, the file is still saved without a DTD
reference and so is invalid (the entity is undefined).
For example, given the (well-formed) input file:
---
<?xml version="1.0" encoding="UTF-8"?>
<article>
<title>Entity &#160; problems</title>
<para>We want a non-breaking-space.  There should be two spaces between
the first sentence and the second.</para>
</article>
---
If it is then opened with XXE and resaved, it is saved as:
---
<?xml version="1.0" encoding="UTF-8"?>
<article>
<title>Entity &#160; problems</title>
<para>We want a non-breaking-space. There should be two spaces between
the first sentence and the second.</para>
</article>
---
Which is clearly not well-formed. Again, I think this is related to the
mechanism that XXE uses which includes the original file as an external
entity in order to validate it, but I wanted to make sure the scope of
the problem was exposed to your development team.
Take care,
John L. Clark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://www.xmlmind.com/pipermail/xmleditor-support/attachments/20041007/76de7c25/attachment.sig