John L. Clark wrote: > I believe this is related to the problem I reported several days ago > (regarding the mechanism by which XXE identifies and loads XML files > with no associated DTD), but I wanted to make sure that the problem had > been identified. I discovered that with such documents (DTD-less > DocBook files), if they contained the the 0xa0 (non-breaking-space) > character, it would be translated automatically to the entity > when the file is saved. However, the file is still saved without a DTD > reference and so is invalid (the entity is undefined).
DTD-less DocBook files are not *documents*. You can call them fragments, modules, external entities, whatever. > For example, given the (well-formed) input file: > > --- > <?xml version="1.0" encoding="UTF-8"?> > <article> > <title>Entity &#160; problems</title> > > <para>We want a non-breaking-space.  There should be two spaces between > the first sentence and the second.</para> > </article> > --- > > If it is then opened with XXE and resaved, it is saved as: > > --- > <?xml version="1.0" encoding="UTF-8"?> > <article> > <title>Entity &#160; problems</title> > > <para>We want a non-breaking-space. There should be two spaces between > the first sentence and the second.</para> > </article> > --- > > Which is clearly not well-formed. I don't agree: the above article is a perfectly valid external entity which is supposed to be referenced by a master document which has the proper <!DOCTYPE>. If you use Emacs to write ``by hand'' a DocBook article which is intended to be an external entity referenced by a DocBook book, you would write " " not " ". > Again, I think this is related to the mechanism that XXE uses which includes > the original file as an external > entity in order to validate it, but I wanted to make sure the scope of > the problem was exposed to your development team. We have already fixed your XML declaration problem. That was a real bug (because we have overlooked something in the way we trick XXE to treat mere fragments as first class documents). What you describe in this email is clearly *not a bug*. * If your article is a ``module'' which is intended to be part of a master document, outputing " " is OK because other applications (saxon, xmllint, etc) are not supposed to load your article as a stand-alone *document*. * If you feel uncomfortable with this (I don't see why, but...), never ever use <DOCTYPE>-less modules. Always add a <DOCTYPE> to all your document templates. If after that, you want to use them as modules, use XIncludes and not references to external entities (Options dialog box, Edit tab). * If you still feel uncomfortable with this, another solution is to configure XXE to not save characters as entity references (Options dialog box, Save tab).

