I believe this is related to the problem I reported several days ago
(regarding the mechanism by which XXE identifies and loads XML files
with no associated DTD), but I wanted to make sure that the problem had
been identified.  I discovered that with such documents (DTD-less
DocBook files), if they contained the the 0xa0 (non-breaking-space)
character, it would be translated automatically to the entity  
when the file is saved.  However, the file is still saved without a DTD
reference and so is invalid (the   entity is undefined).

For example, given the (well-formed) input file:

---
<?xml version="1.0" encoding="UTF-8"?>
<article>
  <title>Entity &amp;#160; problems</title>

  <para>We want a non-breaking-space.&#160; There should be two spaces between
  the first sentence and the second.</para>
</article>
---

If it is then opened with XXE and resaved, it is saved as:

---
<?xml version="1.0" encoding="UTF-8"?>
<article>
  <title>Entity &amp;#160; problems</title>

  <para>We want a non-breaking-space.&nbsp; There should be two spaces between
  the first sentence and the second.</para>
</article>
---

Which is clearly not well-formed.  Again, I think this is related to the
mechanism that XXE uses which includes the original file as an external
entity in order to validate it, but I wanted to make sure the scope of
the problem was exposed to your development team.

Take care,

    John L. Clark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : 
http://www.xmlmind.com/pipermail/xmleditor-support/attachments/20041007/76de7c25/attachment.sig
 

Reply via email to