Hi Liam,

I misdiagnosed the problem. The problem actually seems to be that the XML file 
I am parsing has a file entity whose path contains a Unicode character that 
needs to be escaped.

Here is the XML I am trying to parse:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 
"W:/matlab/sys/namespace/docbook/v4/dtd/docbookx.dtd" [
<!ENTITY sect-002  SYSTEM "./uc£_html_files/image-000-chapter.xfrag">
]>
<book lang="en">
<?dbhtml filename="uc£.html"?>
<bookinfo><title></title><subtitle></subtitle><pubdate>31-Jul-2022 
11:08:41</pubdate></bookinfo>&sect-002;</book>

Here is the error returned by the parser.

"Entity 'sect-002' failed to parse\n"

The parser escapes high-order characters in the URL for the main XML file but 
apparently does not do the same for file entities declared in the DTD.

I am currently trying to convert a Xerces-c/Xalan-c application to libxml/xslt. 
This is because Xalan-c is unable to execute the Docbook FO stylesheet. My 
Xerces-c implementation uses a custom entity resolver to resolve file entities. 
I might need a custom entity resolver to fix the problem with the libxml2 
implementation. However, libxml2 does not seem to support custom entity 
resolvers. At lease, I have not been able to find this feature in the doc or 
the libxml2 code base on GitHub.

I would appreciate any help you can give to finding a solution.,

Regards,

Paul


From: Liam R E Quin <l...@holoweb.net>
Sent: Saturday, July 30, 2022 4:02 PM
To: Paul Kinnucan <pa...@mathworks.com>; xml@gnome.org
Subject: Re: [xml] How can I parse an XML file whose filesystem path is a 
Unicode string?

On Sat, 2022-07-30 at 17:15 +0000, Paul Kinnucan via xml wrote:
> Hi,
>
> I need to parse XML files whose paths may contain Unicode characters,
> for example,
>
> W:\jtbug\uc£\mydoc£.xml
>
> What is the best way to do this with libxml2?

Sounds like you are using Microsoft Windows and are going to use the C
API? How far have you got? What problems are you having exactly? What
errors do you get?

--
Liam Quin, 
https://www.delightfulcomputing.com/<https://www.delightfulcomputing.com> 
https://www.paligo.net/<https://www.paligo.net>
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Antique illustrations, stock images, text:  
http://www.fromoldbooks.org<http://www.fromoldbooks.org>
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to