Re: [users@httpd] Automatically setting the correct Content-Type for XML documents based on namespace

Rich Bowen Thu, 28 May 2026 07:05:42 -0700

As you noted, mod_mime_magic uses its own magic file format and doesn't do XML 
parsing. It can match byte patterns at fixed offsets, which works for binary 
formats but isn't sufficient for namespace-aware XML detection where the 
namespace URI can appear at varying positions.


Practical approaches available today

Option A: File extensions (simplest) If your XML files use distinct extensions 
(.svg, .xhtml, .mathml, etc.), AddType in mod_mime handles this trivially:

AddType image/svg+xml .svg
AddType application/mathml+xml .mathml
AddType application/xhtml+xml .xhtml

This is what most deployments do and it's fast — no file inspection needed.

Option B: mod_lua (content-aware, per-request) mod_lua can hook into the 
type-checking phase and inspect file contents. A LuaHookTypeChecker script 
could open the file, parse enough XML to extract the root element and 
namespace, then set r.content_type accordingly. Something like:

require “apache2"
function type_check(r)
        -- read first 1024 bytes, match namespace
        local f = io.open(r.filename, "r”)
        if not f then return apache2.DECLINED end
        local head = f:read(1024)
        f:close()
        if head:match('xmlns%s*=%s*"http://www.w3.org/2000/svg";') then
                r.content_type = "image/svg+xml”
                return apache2.OK
        end
        -- add more namespace mappings…
        return apache2.DECLINED
end

This is lightweight, doesn't require libxml2, and you could load the 
/usr/share/mime/XMLnamespaces file as your lookup table.

Option C: mod_ext_filter (heavyweight but flexible) You could use an external 
script that reads the file and outputs the correct Content-Type, but this has 
performance implications since it forks a process per request.

A dedicated module that uses libxml2 (as mod_xml2enc already does) to parse 
just the root element + namespace, then consults a mapping table — that would 
be the "right" solution. It doesn't exist yet. If you're inclined to write one, 
the [email protected] list is where to propose it. The GitHub mirror 
(github.com/apache/httpd) is read-only and PRs there tend to get overlooked.

You're correct that mod_mime_magic predates libmagic as a standalone library. 
There's been periodic discussion about replacing it with a libmagic wrapper, 
but no one has done the work. It's a candidate for a "help wanted" contribution 
if you're interested.

Hope that helps point you in a useful direction.

--Rich

> On May 14, 2026, at 4:42 PM, John Scott <[email protected]> wrote:
> 
> Hi,
> XML is used for many different kinds of files, and Apache serving up all XML 
> files as having application/xml type unless explicitly told otherwise is 
> suboptimal. For example, correct usage of more specific types can be useful 
> for content negotiation: a user agent might have a preference between 
> 'text/vcard' and 'application/vcard+xml', for example. Just as Apache does 
> for XHTML, these files usually contain enough information to identify their 
> correct type and, when the top-level element has a designated XML namespace, 
> this can be done without any chance of error: determining a more specific 
> media type for an XML document is then a deterministic procedure, not a 
> matter of guesswork.
> 
> I can't find any existing solutions using Apache as the HTTP server, though, 
> such as with a module. Is this just something no one has gotten around to 
> implementing yet (either in the Apache HTTP Server project or on their own)? 
> Has anyone solved this problem before? Here's some research I've done on the 
> matter.
> 
> • It appears there is precedent for using libxml2 to implement functionality 
> in httpd, but the only obvious one is in mod_xml2enc 
> https://httpd.apache.org/docs/trunk/mod/mod_xml2enc.html which is about 
> handling text encodings on-the-fly as a filter. If libxml2 is already used 
> some for Apache modules, then using it to parse an XML document's root 
> element, namespace, and DOCTYPE declaration ought to be pretty 
> straightforward, as a first step to inform the choice of a superior media 
> type. Do any other parts of the server do anything like this? If not, I'm 
> hopeful a quality implementation of this could be considered for inclusion in 
> the core distribution.
> 
> • To heuristically determine media types of files generally, mod_mime_magic 
> https://httpd.apache.org/docs/trunk/mod/mod_mime_magic.html is described as 
> working "like" the file(1) command. Unfortunately, Apache uses its own 
> home-grown implementation for this job ("This module is derived from a free 
> version of the file(1) command for Unix"), and it expects to be used with a 
> MimeMagicFile in the format of the one supplied by Apache. This means that 
> even when improvements are made to the file(1) command and libmagic library 
> that the majority of libre systems use, it will not trickle down to Apache. 
> Is there a reason for this apparent code duplication?
> Maybe it comes from a time before the libmagic library 
> https://www.darwinsys.com/file/ existed; curiously, that upstream project is 
> the same as what mod_mime_magic is based on anyway.
> 
> • This subject matter is based on the premise that the name of an XML 
> document's root element, along with at least one of a document type 
> declaration or an XML namespace declaration, can uniquely identify an XML 
> document's kind and inform user agents of how to use those XML files fetched. 
> This can be materialized from two different approaches, neither of which I've 
> been able to pull off.
> 
> ◦ The IANA has the registration of a media type called 
> application/prs.implied-document+xml 
> https://www.iana.org/assignments/media-types/application/prs.implied-document+xml
>  which allows this concept in general. The comments for the registration 
> express this well:
>> This media type identifies a meta-format that encompasses all XML-based 
>> formats which are identified by a particular name of the root element, 
>> optionally together with a namespace URI or the PUBLIC identifier stored in 
>> the DTD. It it intended for use in applications that describe files using 
>> media types, but do not have sufficient heuristics to output a more specific 
>> media type. In such a case, the application may parse XML and use the name 
>> of the root element and the DTD to the "root", "ns", and "public" parameters.
> It even gives an example: the common image/svg+xml type is approximately 
> equivalent to
> application/prs.implied-document+xml;root=svg;ns="http://www.w3.org/2000/svg";public="-//W3C//DTD
>  SVG 1.1//EN"
> If something somewhere in the pipeline could express a media type like this, 
> then the canonical image/svg+xml could be substituted as an alias somewhere.
> 
> ◦ An orthogonal issue is, in what way could we define such "canonical media 
> types" that correspond to some XML document type? I am pleased to discover 
> that the shared-mime-info database specification, commonly used on GNU/Linux, 
> already provides for this! 
> https://specifications.freedesktop.org/shared-mime-info/latest/ar01s02.html#id-1.3.9
> As a matter of fact, on my Debian Trixie system, a file 
> /usr/share/mime/XMLnamespaces already exists. This is a short plain text file 
> with lines such as
>> http://www.abisource.com/awml.dtd abiword application/x-abiword
>> http://www.w3.org/1998/Math/MathML math application/mathml+xml
>> http://www.w3.org/1999/xhtml html application/xhtml+xml
> and so on. So regardless of whether the application/prs.implied-document+xml 
> media type is used somewhere as an internal representation, this is a 
> straightforward mapping that provides everything needed. If only Apache could 
> use it.
> 
> If any solutions exist along these lines, I don't know of them yet but would 
> love to. Otherwise, I ask of the sympathetic readership: how do you handle 
> this?


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [users@httpd] Automatically setting the correct Content-Type for XML documents based on namespace

Reply via email to