I am sponsoring this case for Nick Kew. It is a straightfoward addition
of two more Apache httpd modules. Timeout set to 10/20/200.
Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
modules mod_proxy_html and mod_xml2enc for Apache HTTPD 2.2
1.2. Name of Document Author/Supplier:
Author: Nick Kew
1.3 Date of This Document:
12 October, 2009
4. Technical Description
1. Introduction
1.1 Title
modules mod_proxy_html and mod_xml2enc for Apache HTTPD 2.2
1.2 Author
Nick Kew <niq at sun.com>
1.3 Date
2009-10-09
1.4 Customers
Users of Apache HTTPD as a reverse proxy/gateway
1.5 Email Aliases
1.5.2 Responsible Engineer
niq at sun.com
1.5.4 Interest List
webstack-discuss at opensolaris.org
2. Summary
2.1 Description
mod_proxy_html is a markup-aware filter capable of rewriting
HTML and XHTML on-the-fly. It is commonly required in a
reverse proxy situation, such as where Apache HTTPD is used
as a gateway providing external access to servers hosted on
an internal or private network.
mod_xml2enc deals with managing character encodings (charset)
support on behalf of mod_proxy_html and other filters, and
is required to support internationalisation correctly where
a proxied server delivers character sets that are not
ASCII- or Unicode (utf-8)-compatible.
2.2 Risks and Assumptions
N/A
3. Business Summary
3.1 Problem Area
A web gateway, or reverse proxy, is often used to provide external
access to servers hosted on a private network. Often such servers
include or generate documents containing links that cannot be
reached or even resolved from outside the network.
mod_proxy_html serves to rewrite such links into the gateway's
address space, so that they can be addressed from outside the
private network.
mod_proxy_html is one of several Apache filter modules based on
parsing documents with libxml2. This library uses unicode
(utf-8) internally, and is capable of reading documents in a
limited range of other character encodings.
mod_xml2enc is required to detect automatically the character
set of incoming documents so they can be correctly parsed,
and is required to support character encodings not directly
supported by libxml2. mod_xml2enc provides strong
internationalisation support for mod_proxy_html, and other
libxml2-based modules such as mod_transform (XInclude/XSLT
processing) and mod_xml2 (SAX2 event-based parsing).
3.2 Market/Requester
mod_proxy_html has been requested by Web Stack users.
3.3 Business Justification
Packaging these modules provides significant added value for
Web Stack users. Specifically, it enables use of the Web Stack
in a major class of applications that are not possible without
these modules.
3.4 Competitive Analysis
The reverse proxy/gateway is a common application, and one in
which Apache HTTPD has a strong market share. In addition to
widely-supported proxy functionality such as loadbalancing and
cacheing/acceleration, and web application firewall (a third-party
product, mod_security), Apache has a significant advantage in its
filter architecture. This enables a wide range of applications
in the areas of content transformation and aggregation.
One such is mod_proxy_html, which enables Apache to serve as
gateway to systems that would fail if accessed through a
basic non-content-aware gateway.
Earlier versions of mod_proxy_html have been packaged for some
years by many competitors, such as Linux distributions, FreeBSD,
and even Windows packagers.
3.5 Opportunity Window/Exposure
These modules are relevant for the forseeable future. Earlier
versions have been in widespread use for several years.
3.6 How will you know when you are done?
IPS Packages will be available. They will be verified as rewriting
links in a proxy context, and correctly handling "exotic"
character sets using test cases such as Pravda (cyrillic script
and incorporating complex transcluded content).
4. Technical Description:
4.1 Details
This is a straightforward packaging effort.
4.2 Bug/RFE Number(s)
CR 6716096 (mod_xml2enc) and CR 6716092 (mod_proxy_html)
OSR 9711 (mod_xml2enc) and OSR 9712 (mod_proxy_html)
4.3 In Scope
Documentation will be updated.
4.4 Out of Scope
The tutorial at www.apachetutor.org should be updated and may
additionally be duplicated at Sun for Web Stack users. However,
this falls outside this project.
4.5 Interfaces
Both modules introduce a number of configuration options
for Apache's httpd.conf, and mod_proxy_html packages a sample
configuration file proxy_html.conf configuring it to parse
documents by default as standard W3C HTML 4.01 or XHTML 1.0.
In addition, mod_xml2enc exports an API/ABI for internationalisation
support in file mod_xml2enc.h. These are expected to remain
compatible for at least the lifetime of Apache 2.2.
Exported Interfaces
NAME STABILITY DESCRIPTION
-----------------------------------------------------------------------
/usr/apache2/2.2/include/mod_xml2enc.h
Uncommitted Header file exporting
i18n API/ABI
/etc/apache2/2.2/samples-conf.d/proxy_html.conf
Uncommitted Configuration sample
xml2EncDefault Uncommitted Configuration directive
xml2EncAlias Uncommitted Configuration directive
xml2StartParse Uncommitted Configuration directive
ProxyHTMLEvents Uncommitted Configuration directive
ProxyHTMLLinks Uncommitted Configuration directive
ProxyHTMLURLMap Uncommitted Configuration directive
ProxyHTMLDoctype Uncommitted Configuration directive
ProxyHTMLFixups Uncommitted Configuration directive
ProxyHTMLMeta Uncommitted Configuration directive
ProxyHTMLInterp Uncommitted Configuration directive
ProxyHTMLExtended Uncommitted Configuration directive
ProxyHTMLStripComments Uncommitted Configuration directive
ProxyHTMLLogVerbose Uncommitted Configuration directive
ProxyHTMLBufSize Uncommitted Configuration directive
ProxyHTMLCharsetOut Uncommitted Configuration directive
ProxyHTMLEnable Uncommitted Configuration directive
SUNWapch22m-xml2enc Committed Package name
SUNWapch22m-proxy-html Committed Package name
Imported Interfaces
NAME Stability Description ARC Case ref
-----------------------------------------------------------------------
libxml2 Stable XML library PSARC/2008/032
Apache HTTPD Uncommitted Apache HTTPD PSARC/2007/169
4.6 Doc Impact
Documentation is available at the originator's website.
The configuration sample will direct users there.
4.7 Admin/Config Impact
New Apache configuration options as documented at
http://apache.webthing.com/mod_proxy_html/ and
http://apache.webthing.com/mod_xml2enc/
4.8 HA Impact
N/A
4.9 I18N/L10N Impact
By default, all contents are served as UTF-8, regardless of the
character encoding of the original contents. mod_xml2enc supports
automatic detection and administrator overrides of different
character encodings, and conversion of output to an administrator's
choice of encoding.
4.10 Packaging/Delivery
This project is part of the Sun Webstack, and will be packaged
as SUNWapch22m-xml2enc and SUNWapch22m-proxy-html
It has no impact on existing packages.
4.11 Security Impact
None known.
4.12 Dependencies
This proposal depends on Apache HTTPD 2.2.x and libxml2 2.6 or later.
libxml2: http://arc.opensolaris.org/caselog/PSARC/2008/032/
Apache HTTPD: http://arc.opensolaris.org/caselog/PSARC/2007/169/
5. Reference Documents
http://apache.webthing.com/mod_proxy_html/
http://apache.webthing.com/mod_xml2enc/
http://www.apachetutor.org/admin/reverseproxies
OSR 9711 (mod_xml2enc) and OSR 9712 (mod_proxy_html)
cr6716096 (mod_xml2enc) and cr6716092 (mod_proxy_html)
http://arc.opensolaris.org/caselog/PSARC/2008/032/
http://arc.opensolaris.org/caselog/PSARC/2007/169/
6. Projected Availability
2009/12
7. Prototype Availability
The modules are available, and can be compiled "by hand" as
prototype.
APPENDIX A: Files to be delivered in SUNWapch22m-xml2enc
/usr/apache2/2.2/include/mod_xml2enc.h
/usr/apache2/2.2/libexec/mod_xml2enc.so
/usr/apache2/2.2/libexec/amd64/mod_xml2enc.so
/usr/apache2/2.2/libexec/sparc9/mod_xml2enc.so
APPENDIX B: Files to be delivered in SUNWapch22m-proxy-html
/etc/apache2/2.2/samples-conf.d/proxy_html.conf
/usr/apache2/2.2/libexec/mod_proxy_html.so
/usr/apache2/2.2/libexec/amd64/mod_proxy_html.so
/usr/apache2/2.2/libexec/sparc9/mod_proxy_html.so
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
sfw
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open