The http://www.w3.org/XML/ page seems to indicate latin-1 encoding
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"xml:lang="en"lang="en">
but it seem to contain UTF8 characters (copyright and registertrm).
