Please, need help. I'm sure this is a common request, but I'm obviously missing something.
I'm looking for a simple way to parse HTML and running into two issues. First, I can't find a DTD for HTML that Xerces likes, and second, even without verification it won't parse the most basic HTML (attached Readme.html) I tacked the XML header stuff on the Readme.HTML file, but it doesn't work either way. Anyone attempt this before? -Sandy -- reproduction steps below --- Using XERCES-C_1_1_0_D05-WIN32 under msdev v6 sp1, Windows 98 Typing the DOMCount.exe example (as distributed exe and recompiled debug) Get Assert failure in DOMCOUNT.EXE in DGBDEL.CPP line 47 _BLOCK_TYPE_IS_VALID(pHead->nBlockUse) First of all, the parse seems to puke on the following expressions: + Any comments ... for example <!ENTITY % ContentType "CDATA" -- media type, as per [RFC2045] --> Seems to kill it on the -- media type ... -- portion, same with any of the header comments in the file. The loosehtml4.xml file is attached ... see if it happens for you. The non-debug version of SaxCount does the same.
<!-- This is an EXPERIMENTAL version of the HTML 4.0 "transitional" DTD which includes presentation attributes and elements that W3C expects to phase out as support for style sheets matures. HTML 4.0 includes mechanisms for style sheets, scripting, embedding objects, improved support for right to left and mixed direction text, and enhancements to forms for improved accessibility for people with disabilities. Draft: $Date: 1997/11/07 15:32:37 $ Authors: Dave Raggett <[EMAIL PROTECTED]> Arnaud Le Hors <[EMAIL PROTECTED]> This is work in progress, subject to change at any time. It does not imply endorsement by, or the consensus of, either W3C or members of the HTML working group. Further information about HTML 4.0 is available at: http://www.w3.org/TR/PR-html40 --> <!ENTITY % HTML.Version "-//W3C//DTD HTML 4.0 Transitional//EN" -- Typical usage: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/PR-html40/loose.dtd"> <html> <head> ... </head> <body> ... </body> </html> The URL used as a system identifier with the public identifier allows the user agent to download the DTD and entity sets as needed. The FPI for the strict HTML 4.0 DTD is: "-//W3C//DTD HTML 4.0//EN" and its URL is: http://www.w3.org/TR/PR-html40/strict.dtd Authors should use the strict DTD unless they need the presentation control for user agents that don't (adequately) support style sheets. If you are writing a frameset document you should use the following FPI: "-//W3C//DTD HTML 4.0 Frameset//EN" with the URL: http://www.w3.org/TR/PR-html40/frameset.dtd The following URLs are supported in relation to HTML 4.0 "http://www.w3.org/TR/PR-html40/strict.dtd" (Strict DTD) "http://www.w3.org/TR/PR-html40/loose.dtd" (Loose DTD) "http://www.w3.org/TR/PR-html40/frameset.dtd" (Frameset DTD) "http://www.w3.org/TR/PR-html40/HTMLlat1.ent" (Latin-1 entities) "http://www.w3.org/TR/PR-html40/HTMLsymbol.ent" (Symbol entities) "http://www.w3.org/TR/PR-html40/HTMLspecial.ent" (Special entities) These URLs point to the latest version of each file. To reference this specific revision use the following URLs: "http://www.w3.org/TR/PR-html40-971107/strict.dtd" "http://www.w3.org/TR/PR-html40-971107/loose.dtd" "http://www.w3.org/TR/PR-html40-971107/frameset.dtd" "http://www.w3.org/TR/PR-html40-971107/HTMLlat1.ent" "http://www.w3.org/TR/PR-html40-971107/HTMLsymbol.ent" "http://www.w3.org/TR/PR-html40-971107/HTMLspecial.ent" --> <!--================== Imported Names ====================================--> <!ENTITY % ContentType "CDATA" -- media type, as per [RFC2045] --> <!ENTITY % ContentTypes "CDATA" -- comma-separated list of media types, as per [RFC2045] --> <!ENTITY % Charset "CDATA" -- a character encoding, as per [RFC2045] --> <!ENTITY % Charsets "CDATA" -- a space separated list of character encodings, as per [RFC2045] --> <!ENTITY % LanguageCode "NAME" -- a language code, as per [RFC1766] --> <!ENTITY % Character "CDATA" -- a single character from [ISO10646] --> <!ENTITY % LinkTypes "CDATA" -- space-separated list of link types --> <!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of media descriptors --> <!ENTITY % URL "CDATA" -- a Uniform Resource Locator, see [RFC1808] and [RFC1738] --> <!ENTITY % Datetime "CDATA" -- date and time information. ISO date format --> <!ENTITY % Script "CDATA" -- script expression --> <!ENTITY % FrameTarget "CDATA" -- render in this frame --> <!ENTITY % Text "CDATA" -- render in this frame --> <!-- Parameter Entities --> <!ENTITY % head.misc "SCRIPT|STYLE|META|LINK" -- repeatable head elements --> <!ENTITY % heading "H1|H2|H3|H4|H5|H6"> <!ENTITY % list "UL | OL | DIR | MENU"> <!ENTITY % preformatted "PRE"> <!ENTITY % Color "CDATA" -- a color using sRGB: #RRGGBB as Hex values --> <!-- There are also 16 widely known color names with their sRGB values: Black = #000000 Green = #008000 Silver = #C0C0C0 Lime = #00FF00 Gray = #808080 Olive = #808000 White = #FFFFFF Yellow = #FFFF00 Maroon = #800000 Navy = #000080 Red = #FF0000 Blue = #0000FF Purple = #800080 Teal = #008080 Fuchsia= #FF00FF Aqua = #00FFFF --> <!ENTITY % bodycolors " bgcolor %Color; #IMPLIED -- document background color -- text %Color; #IMPLIED -- document text color -- link %Color; #IMPLIED -- color of links -- vlink %Color; #IMPLIED -- color of visited links -- alink %Color; #IMPLIED -- color of selected links -- "> <!--================ Character mnemonic entities =========================--> <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML" "http://www.w3.org/TR/PR-html40/HTMLlat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML" "http://www.w3.org/TR/PR-html40/HTMLsymbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special//EN//HTML" "http://www.w3.org/TR/PR-html40/HTMLspecial.ent"> %HTMLspecial; <!--=================== Generic Attributes ===============================--> <!ENTITY % coreattrs "id ID #IMPLIED -- document-wide unique id -- class CDATA #IMPLIED -- space separated list of classes -- style CDATA #IMPLIED -- associated style info -- title %Text; #IMPLIED -- advisory title/amplification --" > <!ENTITY % i18n "lang %LanguageCode; #IMPLIED -- language code -- dir (ltr|rtl) #IMPLIED -- direction for weak/neutral text --" > <!ENTITY % events "onclick %Script; #IMPLIED -- a pointer button was clicked -- ondblclick %Script; #IMPLIED -- a pointer button was double clicked -- onmousedown %Script; #IMPLIED -- a pointer button was pressed down -- onmouseup %Script; #IMPLIED -- a pointer button was released -- onmouseover %Script; #IMPLIED -- a pointer was moved onto -- onmousemove %Script; #IMPLIED -- a pointer was moved within -- onmouseout %Script; #IMPLIED -- a pointer was moved away -- onkeypress %Script; #IMPLIED -- a key was pressed and released -- onkeydown %Script; #IMPLIED -- a key was pressed down -- onkeyup %Script; #IMPLIED -- a key was released --" > <!-- Reserved Feature Switch --> <!ENTITY % HTML.Reserved "IGNORE"> <!-- The following attributes are reserved for possible future use --> <![ %HTML.Reserved; [ <!ENTITY % reserved "datasrc %URL; #IMPLIED -- a single or tabular Data Source -- datafld CDATA #IMPLIED -- the property or column name -- dataformatas (plaintext|html) plaintext -- text or html --" > ]]> <!ENTITY % reserved ""> <!ENTITY % attrs "%coreattrs; %i18n; %events;"> <!ENTITY % align "align (left|center|right|justify) #IMPLIED" -- default is left for ltr paragraphs, right for rtl -- > <!--=================== Text Markup ======================================--> <!ENTITY % fontstyle "TT | I | B | U | S | STRIKE | BIG | SMALL"> <!ENTITY % phrase "EM | STRONG | DFN | CODE | SAMP | KBD | VAR | CITE | ABBR"> <!ENTITY % special "A | IMG | APPLET | OBJECT | FONT | BASEFONT | BR | SCRIPT | MAP | Q | SUB | SUP | SPAN | BDO | IFRAME"> <!ENTITY % formctrl "INPUT | SELECT | TEXTAREA | LABEL | BUTTON"> <!-- %inline; covers inline or "text-level" elements --> <!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl;"> <!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*> <!ATTLIST (%fontstyle;|%phrase;) %attrs; -- %coreattrs, %i18n, %events -- > <!ELEMENT (SUB|SUP) - - (%inline;)* -- subscript, superscript --> <!ATTLIST (SUB|SUP) %attrs; -- %coreattrs, %i18n, %events -- > <!ELEMENT SPAN - - (%inline;)* -- generic language/style container --> <!ATTLIST SPAN %attrs; -- %coreattrs, %i18n, %events -- charset %Charset; #IMPLIED -- char encoding of linked resource -- type %ContentType; #IMPLIED -- advisory content type -- href %URL; #IMPLIED -- URL for linked resource -- hreflang %LanguageCode; #IMPLIED -- language code -- target %FrameTarget; #IMPLIED -- render in this frame -- rel %LinkTypes; #IMPLIED -- forward link types -- rev %LinkTypes; #IMPLIED -- reverse link types -- media %MediaDesc; #IMPLIED -- for rendering on these media -- %reserved; -- reserved for possible future use -- > <!ELEMENT BDO - - (%inline;)* -- I18N BiDi over-ride --> <!ATTLIST BDO %coreattrs; -- id, class, style, title -- lang %LanguageCode; #IMPLIED -- language code -- dir (ltr|rtl) #REQUIRED -- directionality -- > <!ELEMENT BASEFONT - O EMPTY -- base font size --> <!ATTLIST BASEFONT id ID #IMPLIED -- document-wide unique id -- size CDATA #REQUIRED -- base font size for FONT elements -- color %Color; #IMPLIED -- "#RRGGBB" in hex, e.g. red: "#FF0000" -- face CDATA #IMPLIED -- comma separated list of font names -- > <!ELEMENT FONT - - (%inline;)* -- local change to font --> <!ATTLIST FONT %coreattrs; -- id, class, style, title -- %i18n; -- lang, dir -- size CDATA #IMPLIED -- [+|-]nn e.g. size="+1", size="4" -- color %Color; #IMPLIED -- "#RRGGBB" in hex, e.g. red: "#FF0000" -- face CDATA #IMPLIED -- comma separated list of font names -- > <!ELEMENT BR - O EMPTY -- forced line break --> <!ATTLIST BR %coreattrs; -- id, class, style, title -- clear (left|all|right|none) none -- control of text flow -- > <!--================== HTML content models ===============================--> <!-- HTML has two basic content models: %inline; character level elements and text strings %block; block-like elements e.g. paragraphs and lists --> <!ENTITY % block "P | %heading; | %list; | %preformatted; | DL | DIV | CENTER | NOSCRIPT | NOFRAMES | BLOCKQUOTE | FORM | ISINDEX | HR | TABLE | FIELDSET | ADDRESS"> <!ENTITY % flow "%block; | %inline;"> <!--=================== Document Body ====================================--> <!ELEMENT BODY O O (%flow;)* +(INS|DEL) -- document body --> <!ATTLIST BODY %attrs; -- %coreattrs, %i18n, %events -- background %URL; #IMPLIED -- texture tile for document background -- %bodycolors; -- bgcolor, text, link, vlink, alink -- onload %Script; #IMPLIED -- the document has been loaded -- onunload %Script; #IMPLIED -- the document has been removed -- > <!ELEMENT ADDRESS - - ((%inline;) | P)* -- information on author --> <!ATTLIST ADDRESS %attrs; -- %coreattrs, %i18n, %events -- > <!ELEMENT DIV - - (%flow;)* -- generic language/style container --> <!ATTLIST DIV %attrs; -- %coreattrs, %i18n, %events -- charset %Charset; #IMPLIED -- char encoding of linked resource -- type %ContentType; #IMPLIED -- advisory content type -- href %URL; #IMPLIED -- URL for linked resource -- hreflang %LanguageCode; #IMPLIED -- language code -- target %FrameTarget; #IMPLIED -- render in this frame -- rel %LinkTypes; #IMPLIED -- forward link types -- rev %LinkTypes; #IMPLIED -- reverse link types -- media %MediaDesc; #IMPLIED -- for rendering on these media -- %align; -- align, text alignment -- %reserved; -- reserved for possible future use -- > <!ELEMENT CENTER - - (%flow;)* -- shorthand for DIV with align=center --> <!ATTLIST CENTER %attrs; -- %coreattrs, %i18n, %events -- > <!--================== The Anchor Element ================================--> <!ENTITY % Shape "(rect|circle|poly|default)"> <!ENTITY % Coords "CDATA" -- comma separated list of numbers --> <!ELEMENT A - - (%inline;)* -(A) -- anchor --> <!ATTLIST A %attrs; -- %coreattrs, %i18n, %events -- charset %Charset; #IMPLIED -- char encoding of linked resource -- type %ContentType; #IMPLIED -- advisory content type -- name CDATA #IMPLIED -- named link end -- href %URL; #IMPLIED -- URL for linked resource -- hreflang %LanguageCode; #IMPLIED -- language code -- target %FrameTarget; #IMPLIED -- render in this frame -- rel %LinkTypes; #IMPLIED -- forward link types -- rev %LinkTypes; #IMPLIED -- reverse link types -- accesskey %Character; #IMPLIED -- accessibility key character -- shape %Shape; rect -- for use with OBJECT SHAPES -- coords %Coords; #IMPLIED -- for use with OBJECT SHAPES -- tabindex NUMBER #IMPLIED -- position in tabbing order -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- > <!--================== Client-side image maps ============================--> <!-- These can be placed in the same document or grouped in a separate document although this isn't yet widely supported --> <!ELEMENT MAP - - (AREA)+ -- client-side image map --> <!ATTLIST MAP %attrs; -- %coreattrs, %i18n, %events -- name CDATA #REQUIRED -- name of image map for refs by usemap -- > <!ELEMENT AREA - O EMPTY -- client-side image map area --> <!ATTLIST AREA %attrs; -- %coreattrs, %i18n, %events -- shape %Shape; rect -- controls interpretation of coords -- coords %Coords; #IMPLIED -- comma separated list of lengths -- href %URL; #IMPLIED -- URL for linked resource -- target %FrameTarget; #IMPLIED -- render in this frame -- nohref (nohref) #IMPLIED -- this region has no action -- alt %Text; #REQUIRED -- short description -- tabindex NUMBER #IMPLIED -- position in tabbing order -- accesskey %Character; #IMPLIED -- accessibility key character -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- > <!--================== The LINK Element ==================================--> <!-- Relationship values can be used in principle: a) for document specific toolbars/menus when used with the LINK element in document head e.g. start, contents, previous, next, index, end, help b) to link to a separate style sheet (rel=stylesheet) c) to make a link to a script (rel=script) d) by stylesheets to control how collections of html nodes are rendered into printed documents e) to make a link to a printable version of this document e.g. a postscript or pdf version (rel=alternate media=print) --> <!ELEMENT LINK - O EMPTY -- a media-independent link --> <!ATTLIST LINK %attrs; -- %coreattrs, %i18n, %events -- charset %Charset; #IMPLIED -- char encoding of linked resource -- href %URL; #IMPLIED -- URL for linked resource -- hreflang %LanguageCode; #IMPLIED -- language code -- type %ContentType; #IMPLIED -- advisory content type -- rel %LinkTypes; #IMPLIED -- forward link types -- rev %LinkTypes; #IMPLIED -- reverse link types -- media %MediaDesc; #IMPLIED -- for rendering on these media -- target %FrameTarget; #IMPLIED -- render in this frame -- > <!--=================== Images ===========================================--> <!-- Length defined in strict DTD for cellpadding/cellspacing --> <!ENTITY % Length "CDATA" -- nn for pixels or nn% for percentage length --> <!ENTITY % MultiLength "CDATA" -- pixel, percentage, or relative --> <!ENTITY % MultiLengths "CDATA" -- comma-separated list of MultiLength --> <!ENTITY % Pixels "CDATA" -- integer representing length in pixels --> <!ENTITY % IAlign "(top|middle|bottom|left|right)" -- center? --> <!-- To avoid problems with text-only UAs as well as to make image content understandable and navigable to users of non-visual UAs, you need to provide a description with ALT, and avoid server-side image maps --> <!ELEMENT IMG - O EMPTY -- Embedded image --> <!ATTLIST IMG %attrs; -- %coreattrs, %i18n, %events -- src %URL; #REQUIRED -- URL of image to embed -- alt %Text; #REQUIRED -- short description -- longdesc %URL; #IMPLIED -- link to long description (complements alt) -- height %Length; #IMPLIED -- override height -- width %Length; #IMPLIED -- override width -- align %IAlign; #IMPLIED -- vertical or horizontal alignment -- border %Length; #IMPLIED -- link border width -- hspace %Pixels; #IMPLIED -- horizontal gutter -- vspace %Pixels; #IMPLIED -- vertical gutter -- usemap %URL; #IMPLIED -- use client-side image map -- ismap (ismap) #IMPLIED -- use server-side image map -- > <!-- USEMAP points to a MAP element which may be in this document or an external document, although the latter is not widely supported --> <!--==================== OBJECT ======================================--> <!-- OBJECT is used to embed objects as part of HTML pages PARAM elements should precede other content. SGML mixed content model technicality precludes specifying this formally ... --> <!ELEMENT OBJECT - - (PARAM | %flow;)* -- generic embedded object --> <!ATTLIST OBJECT %attrs; -- %coreattrs, %i18n, %events -- declare (declare) #IMPLIED -- declare but don't instantiate flag -- classid %URL; #IMPLIED -- identifies an implementation -- codebase %URL; #IMPLIED -- base URL for classid, data, archive -- data %URL; #IMPLIED -- reference to object's data -- type %ContentType; #IMPLIED -- content type for data -- codetype %ContentType; #IMPLIED -- content type for code -- archive %URL; #IMPLIED -- space separated archive list -- standby %Text; #IMPLIED -- message to show while loading -- height %Length; #IMPLIED -- override height -- width %Length; #IMPLIED -- override width -- align %IAlign; #IMPLIED -- vertical or horizontal alignment -- border %Length; #IMPLIED -- link border width -- hspace %Pixels; #IMPLIED -- horizontal gutter -- vspace %Pixels; #IMPLIED -- vertical gutter -- usemap %URL; #IMPLIED -- use client-side image map -- shapes (shapes) #IMPLIED -- object has shaped hypertext links -- export (export) #IMPLIED -- export shapes to parent -- name CDATA #IMPLIED -- submit as part of form -- tabindex NUMBER #IMPLIED -- position in tabbing order -- %reserved; -- reserved for possible future use -- > <!ELEMENT PARAM - O EMPTY -- named property value --> <!ATTLIST PARAM id ID #IMPLIED -- document-wide unique id -- name CDATA #REQUIRED -- property name -- value CDATA #IMPLIED -- property value -- valuetype (DATA|REF|OBJECT) DATA -- How to interpret value -- type %ContentType; #IMPLIED -- content type for value when valuetype=ref -- > <!--=================== Java APPLET ==================================--> <!-- One of code or object attributes must be present. Place PARAM elements before other content. --> <!ELEMENT APPLET - - (PARAM | %flow;)* -- Java applet --> <!ATTLIST APPLET %coreattrs; -- id, class, style, title -- codebase %URL; #IMPLIED -- optional base URL for applet -- archive CDATA #IMPLIED -- comma separated archive list -- code CDATA #IMPLIED -- applet class file -- object CDATA #IMPLIED -- serialized applet file -- alt %Text; #IMPLIED -- short description -- name CDATA #IMPLIED -- allows applets to find each other -- width %Length; #REQUIRED -- initial width -- height %Length; #REQUIRED -- initial height -- align %IAlign; #IMPLIED -- vertical or horizontal alignment -- hspace %Pixels; #IMPLIED -- horizontal gutter -- vspace %Pixels; #IMPLIED -- vertical gutter -- > <!--=================== Horizontal Rule ==================================--> <!ELEMENT HR - O EMPTY -- horizontal rule --> <!ATTLIST HR %coreattrs; -- id, class, style, title -- %events; align (left|center|right) #IMPLIED noshade (noshade) #IMPLIED size %Pixels; #IMPLIED width %Length; #IMPLIED > <!--=================== Paragraphs =======================================--> <!ELEMENT P - O (%inline;)* -- paragraph --> <!ATTLIST P %attrs; -- %coreattrs, %i18n, %events -- %align; -- align, text alignment -- > <!--=================== Headings =========================================--> <!-- There are six levels of headings from H1 (the most important) to H6 (the least important). --> <!ELEMENT (%heading;) - - (%inline;)* -- heading --> <!ATTLIST (%heading;) %attrs; -- %coreattrs, %i18n, %events -- %align; -- align, text alignment -- > <!--=================== Preformatted Text ================================--> <!-- excludes markup for images and changes in font size --> <!ENTITY % pre.exclusion "IMG|OBJECT|APPLET|BIG|SMALL|SUB|SUP|FONT|BASEFONT"> <!ELEMENT PRE - - (%inline;)* -(%pre.exclusion;) -- preformatted text --> <!ATTLIST PRE %attrs; -- %coreattrs, %i18n, %events -- width NUMBER #IMPLIED > <!--===================== Inline Quotes ==================================--> <!ELEMENT Q - - (%inline;)* -- short inline quotation --> <!ATTLIST Q %attrs; -- %coreattrs, %i18n, %events -- cite %URL; #IMPLIED -- URL for source document or msg -- > <!--=================== Block-like Quotes ================================--> <!ELEMENT BLOCKQUOTE - - (%flow;)* -- long quotation --> <!ATTLIST BLOCKQUOTE %attrs; -- %coreattrs, %i18n, %events -- cite %URL; #IMPLIED -- URL for source document or msg -- > <!--=================== Inserted/Deleted Text ============================--> <!-- INS/DEL are handled by inclusion on BODY --> <!ELEMENT (INS|DEL) - - (%flow;)* -- inserted text, deleted text --> <!ATTLIST (INS|DEL) %attrs; -- %coreattrs, %i18n, %events -- cite %URL; #IMPLIED -- info on reason for change -- datetime %Datetime; #IMPLIED -- date and time of change -- > <!--=================== Lists ============================================--> <!-- definition lists - DT for term, DD for its definition --> <!ELEMENT DL - - (DT|DD)+ -- definition list --> <!ATTLIST DL %attrs; -- %coreattrs, %i18n, %events -- compact (compact) #IMPLIED -- reduced interitem spacing -- > <!ELEMENT DT - O (%inline;)* -- definition term --> <!ELEMENT DD - O (%flow;)* -- definition description --> <!ATTLIST (DT|DD) %attrs; -- %coreattrs, %i18n, %events -- > <!-- Ordered lists (OL) Numbering style 1 arablic numbers 1, 2, 3, ... a lower alpha a, b, c, ... A upper alpha A, B, C, ... i lower roman i, ii, iii, ... I upper roman I, II, III, ... The style is applied to the sequence number which by default is reset to 1 for the first list item in an ordered list. This can't be expressed directly in SGML due to case folding. --> <!ENTITY % OLStyle "CDATA" -- constrained to: "(1|a|A|i|I)" --> <!ELEMENT OL - - (LI)+ -- ordered list --> <!ATTLIST OL %attrs; -- %coreattrs, %i18n, %events -- type %OLStyle; #IMPLIED -- numbering style -- compact (compact) #IMPLIED -- reduced interitem spacing -- start NUMBER #IMPLIED -- starting sequence number -- > <!-- Unordered Lists (UL) bullet styles --> <!ENTITY % ULStyle "(disc|square|circle)"> <!ELEMENT UL - - (LI)+ -- unordered list --> <!ATTLIST UL %attrs; -- %coreattrs, %i18n, %events -- type %ULStyle; #IMPLIED -- bullet style -- compact (compact) #IMPLIED -- reduced interitem spacing -- > <!ELEMENT (DIR|MENU) - - (LI)+ -(%block;) -- directory list, menu list --> <!ATTLIST DIR %attrs; -- %coreattrs, %i18n, %events -- compact (compact) #IMPLIED > <!ATTLIST MENU %attrs; -- %coreattrs, %i18n, %events -- compact (compact) #IMPLIED > <!-- <DIR> Directory list --> <!-- <DIR COMPACT> Compact list style --> <!-- <MENU> Menu list --> <!-- <MENU COMPACT> Compact list style --> <!ENTITY % LIStyle "CDATA" -- constrained to: "(%ULStyle;|%OLStyle;)" --> <!ELEMENT LI - O (%flow;)* -- list item --> <!ATTLIST LI %attrs; -- %coreattrs, %i18n, %events -- type %LIStyle; #IMPLIED -- list item style -- value NUMBER #IMPLIED -- reset sequence number -- > <!--================ Forms ===============================================--> <!ELEMENT FORM - - (%flow;)* -(FORM) -- interactive form --> <!ATTLIST FORM %attrs; -- %coreattrs, %i18n, %events -- action %URL; #REQUIRED -- server-side form handler -- method (GET|POST) GET -- HTTP method used to submit the form -- enctype %ContentType; "application/x-www-form-urlencoded" onsubmit %Script; #IMPLIED -- the form was submitted -- onreset %Script; #IMPLIED -- the form was reset -- target %FrameTarget; #IMPLIED -- render in this frame -- accept-charset %Charsets; #IMPLIED -- list of supported charsets -- > <!-- Each label must not contain more than ONE field --> <!ELEMENT LABEL - - (%inline;)* -(LABEL) -- form field label text --> <!ATTLIST LABEL %attrs; -- %coreattrs, %i18n, %events -- for IDREF #IMPLIED -- matches field ID value -- accesskey %Character; #IMPLIED -- accessibility key character -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- > <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | RADIO | SUBMIT | RESET | FILE | HIDDEN | IMAGE | BUTTON)" > <!-- attribute name required for all but submit & reset --> <!ELEMENT INPUT - O EMPTY -- form control --> <!ATTLIST INPUT %attrs; -- %coreattrs, %i18n, %events -- type %InputType; TEXT -- what kind of widget is needed -- name CDATA #IMPLIED -- submit as part of form -- value CDATA #IMPLIED -- required for radio and checkboxes -- checked (checked) #IMPLIED -- for radio buttons and check boxes -- disabled (disabled) #IMPLIED -- control is unavailable in this context -- readonly (readonly) #IMPLIED -- for text and passwd -- size CDATA #IMPLIED -- specific to each type of field -- maxlength NUMBER #IMPLIED -- max chars for text fields -- src %URL; #IMPLIED -- for fields with images -- alt CDATA #IMPLIED -- short description -- usemap %URL; #IMPLIED -- use client-side image map -- align %IAlign; #IMPLIED -- vertical or horizontal alignment -- tabindex NUMBER #IMPLIED -- position in tabbing order -- accesskey %Character; #IMPLIED -- accessibility key character -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- onselect %Script; #IMPLIED -- some text was selected -- onchange %Script; #IMPLIED -- the element value was changed -- accept %ContentTypes; #IMPLIED -- list of MIME types for file upload -- %reserved; -- reserved for possible future use -- > <!ELEMENT SELECT - - (OPTGROUP|OPTION)+ -- option selector --> <!ATTLIST SELECT %attrs; -- %coreattrs, %i18n, %events -- name CDATA #IMPLIED -- field name -- size NUMBER #IMPLIED -- rows visible -- multiple (multiple) #IMPLIED -- default is single selection -- disabled (disabled) #IMPLIED -- control is unavailable in this context -- tabindex NUMBER #IMPLIED -- position in tabbing order -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- onchange %Script; #IMPLIED -- the element value was changed -- %reserved; -- reserved for possible future use -- > <!ELEMENT OPTGROUP - - (OPTGROUP|OPTION)+ -- option group --> <!ATTLIST OPTGROUP %attrs; -- %coreattrs, %i18n, %events -- disabled (disabled) #IMPLIED -- control is unavailable in this context -- label %Text; #REQUIRED -- for use in hierarchical menus -- > <!ELEMENT OPTION - O (#PCDATA) -- selectable choice --> <!ATTLIST OPTION %attrs; -- %coreattrs, %i18n, %events -- selected (selected) #IMPLIED disabled (disabled) #IMPLIED -- control is unavailable in this context -- label %Text; #IMPLIED -- for use in hierarchical menus -- value CDATA #IMPLIED -- defaults to element content -- > <!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field --> <!ATTLIST TEXTAREA %attrs; -- %coreattrs, %i18n, %events -- name CDATA #IMPLIED rows NUMBER #REQUIRED cols NUMBER #REQUIRED disabled (disabled) #IMPLIED -- control is unavailable in this context -- readonly (readonly) #IMPLIED tabindex NUMBER #IMPLIED -- position in tabbing order -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- onselect %Script; #IMPLIED -- some text was selected -- onchange %Script; #IMPLIED -- the element value was changed -- %reserved; -- reserved for possible future use -- > <!-- #PCDATA is to solve the mixed content problem, per specification only whitespace is allowed there! --> <!ELEMENT FIELDSET - - (#PCDATA,LEGEND,(%flow;)*) -- form control group --> <!ATTLIST FIELDSET %attrs; -- %coreattrs, %i18n, %events -- > <!ELEMENT LEGEND - - (%inline;)* -- fieldset legend --> <!ENTITY % LAlign "(top|bottom|left|right)"> <!ATTLIST LEGEND %attrs; -- %coreattrs, %i18n, %events -- align %LAlign; #IMPLIED -- relative to fieldset -- accesskey %Character; #IMPLIED -- accessibility key character -- > <!ELEMENT BUTTON - - (%flow;)* -(A|%formctrl;|FORM|ISINDEX|FIELDSET|IFRAME) -- push button --> <!ATTLIST BUTTON %attrs; -- %coreattrs, %i18n, %events -- name CDATA #IMPLIED -- for scripting/forms as submit button -- value CDATA #IMPLIED -- gets passed to server when submitted -- type (button|submit|reset) submit -- for use as form submit/reset button -- disabled (disabled) #IMPLIED -- control is unavailable in this context -- tabindex NUMBER #IMPLIED -- position in tabbing order -- accesskey %Character; #IMPLIED -- accessibility key character -- onfocus %Script; #IMPLIED -- the element got the focus -- onblur %Script; #IMPLIED -- the element lost the focus -- %reserved; -- reserved for possible future use -- > <!--======================= Tables =======================================--> <!-- IETF HTML table standard, see [RFC1942] --> <!-- The BORDER attribute sets the thickness of the frame around the table. The default units are screen pixels. The FRAME attribute specifies which parts of the frame around the table should be rendered. The values are not the same as CALS to avoid a name clash with the VALIGN attribute. The value "border" is included for backwards compatibility with <TABLE BORDER> which yields frame=border and border=implied For <TABLE BORDER=1> you get border=1 and frame=implied. In this case, it is appropriate to treat this as frame=border for backwards compatibility with deployed browsers. --> <!ENTITY % TFrame "(void|above|below|hsides|lhs|rhs|vsides|box|border)"> <!-- The RULES attribute defines which rules to draw between cells: If RULES is absent then assume: "none" if BORDER is absent or BORDER=0 otherwise "all" --> <!ENTITY % TRules "(none | groups | rows | cols | all)"> <!-- horizontal placement of table relative to document --> <!ENTITY % TAlign "(left|center|right)"> <!-- horizontal alignment attributes for cell contents --> <!ENTITY % cellhalign "align (left|center|right|justify|char) #IMPLIED char %Character; #IMPLIED -- alignment char, e.g. char=':' -- charoff %Length; #IMPLIED -- offset for alignment char --" > <!-- vertical alignment attributes for cell contents --> <!ENTITY % cellvalign "valign (top|middle|bottom|baseline) #IMPLIED" > <!ELEMENT TABLE - - (CAPTION?, (COL*|COLGROUP*), THEAD?, TFOOT?, TBODY+)> <!ELEMENT CAPTION - - (%inline;)* -- table caption --> <!ELEMENT THEAD - O (TR)+ -- table header --> <!ELEMENT TFOOT - O (TR)+ -- table footer --> <!ELEMENT TBODY O O (TR)+ -- table body --> <!ELEMENT COLGROUP - O (col)* -- table column group --> <!ELEMENT COL - O EMPTY -- table column --> <!ELEMENT TR - O (TH|TD)+ -- table row --> <!ELEMENT (TH|TD) - O (%flow;)* -- table header cell, table data cell --> <!ATTLIST TABLE -- table element -- %attrs; -- %coreattrs, %i18n, %events -- summary %Text; #IMPLIED -- purpose/structure for speech output -- align %TAlign; #IMPLIED -- table position relative to window -- bgcolor %Color; #IMPLIED -- background color for cells -- width %Pixels; #IMPLIED -- table width relative to window -- border CDATA #IMPLIED -- controls frame width around table -- frame %TFrame; #IMPLIED -- which parts of table frame to include -- rules %TRules; #IMPLIED -- rulings between rows and cols -- cellspacing %Length; #IMPLIED -- spacing between cells -- cellpadding %Length; #IMPLIED -- spacing within cells -- %reserved; -- reserved for possible future use -- > <!ENTITY % CAlign "(top|bottom|left|right)"> <!ATTLIST CAPTION %attrs; -- %coreattrs, %i18n, %events -- align %CAlign; #IMPLIED -- relative to table -- > <!-- COLGROUP groups a set of COL elements. It allows you to group several semantically related columns together. --> <!ATTLIST COLGROUP %attrs; -- %coreattrs, %i18n, %events -- span NUMBER 1 -- default number of columns in group -- width %MultiLength; #IMPLIED -- default width for enclosed COLs -- %cellhalign; -- horizontal alignment in cells -- %cellvalign; -- vertical alignment in cells -- > <!-- COL elements define the alignment properties for cells in one or more columns. The WIDTH attribute specifies the width of the columns, e.g. width=64 width in screen pixels width=0.5* relative width of 0.5 The REPEAT attribute allows you to repeat the effects of a COL element as if the same element was repeated n times. There are no grouping semantics for repeated columns. --> <!ATTLIST COL -- column groups and properties -- %attrs; -- %coreattrs, %i18n, %events -- repeat NUMBER 1 -- repeat count for COL -- width %MultiLength; #IMPLIED -- column width specification -- %cellhalign; -- horizontal alignment in cells -- %cellvalign; -- vertical alignment in cells -- > <!-- Use THEAD to duplicate headers when breaking table across page boundaries, or for static headers when TBODY sections are rendered in scrolling panel. Use TFOOT to duplicate footers when breaking table across page boundaries, or for static footers when TBODY sections are rendered in scrolling panel. Use multiple TBODY sections when rules are needed between groups of table rows. --> <!ATTLIST (THEAD|TBODY|TFOOT) -- table section -- %attrs; -- %coreattrs, %i18n, %events -- %cellhalign; -- horizontal alignment in cells -- %cellvalign; -- vertical alignment in cells -- > <!ATTLIST TR -- table row -- %attrs; -- %coreattrs, %i18n, %events -- %cellhalign; -- horizontal alignment in cells -- %cellvalign; -- vertical alignment in cells -- bgcolor %Color; #IMPLIED -- background color for row -- > <!-- Scope is simpler than axes attribute for common tables --> <!ENTITY % Scope "(row|col|rowgroup|colgroup)"> <!-- TH is for headers, TD for data, but for cells acting as both use TD --> <!ATTLIST (TH|TD) -- header or data cell -- %attrs; -- %coreattrs, %i18n, %events -- abbr %Text; #IMPLIED -- abbreviation for header cell -- axis CDATA #IMPLIED -- names groups of related headers-- headers IDREFS #IMPLIED -- list of id's for header cells -- scope %Scope; #IMPLIED -- scope covered by header cells -- nowrap (nowrap) #IMPLIED -- suppress word wrap -- bgcolor %Color; #IMPLIED -- cell background color -- rowspan NUMBER 1 -- number of rows spanned by cell -- colspan NUMBER 1 -- number of cols spanned by cell -- %cellhalign; -- horizontal alignment in cells -- %cellvalign; -- vertical alignment in cells -- width %Pixels; #IMPLIED -- width for cell -- height %Pixels; #IMPLIED -- height for cell -- > <!--================== Document Frames ===================================--> <!-- The content model for HTML documents depends on whether the HEAD is followed by a FRAMESET or BODY element. The widespread omission of the BODY start tag makes it impractical to define the content model without the use of a marked section. --> <!-- Feature Switch for frameset documents --> <!ENTITY % HTML.Frameset "IGNORE"> <![ %HTML.Frameset; [ <!ELEMENT FRAMESET - - ((FRAMESET|FRAME)+ & NOFRAMES?) -- window subdivision --> <!ATTLIST FRAMESET %coreattrs; -- id, class, style, title -- rows %MultiLengths; #IMPLIED -- list of lengths. Default: 100% (1 row) -- cols %MultiLengths; #IMPLIED -- list of lengths. Default: 100% (1 col) -- onload %Script; #IMPLIED -- all the frames have been loaded -- onunload %Script; #IMPLIED -- all the frames have been removed -- > ]]> <![ %HTML.Frameset; [ <!-- reserved frame names start with "_" otherwise starts with letter --> <!ELEMENT FRAME - O EMPTY -- subwindow --> <!ATTLIST FRAME %coreattrs; -- id, class, style, title -- longdesc %URL; #IMPLIED -- link to long description (complements title) -- name CDATA #IMPLIED -- name of frame for targetting -- src %URL; #IMPLIED -- source of frame content -- frameborder (1|0) 1 -- request frame borders? -- marginwidth %Pixels; #IMPLIED -- margin widths in pixels -- marginheight %Pixels; #IMPLIED -- margin height in pixels -- noresize (noresize) #IMPLIED -- allow users to resize frames? -- scrolling (yes|no|auto) auto -- scrollbar or none -- > ]]> <!ELEMENT IFRAME - - (%flow;)* -- inline subwindow --> <!ATTLIST IFRAME %coreattrs; -- id, class, style, title -- longdesc %URL; #IMPLIED -- link to long description (complements title) -- name CDATA #IMPLIED -- name of frame for targetting -- src %URL; #IMPLIED -- source of frame content -- frameborder (1|0) 1 -- request frame borders? -- marginwidth %Pixels; #IMPLIED -- margin widths in pixels -- marginheight %Pixels; #IMPLIED -- margin height in pixels -- scrolling (yes|no|auto) auto -- scrollbar or none -- align %IAlign; #IMPLIED -- vertical or horizontal alignment -- height %Length; #IMPLIED -- frame height -- width %Length; #IMPLIED -- frame width -- > <![ %HTML.Frameset; [ <!ENTITY % noframes.content "(BODY) -(NOFRAMES)"> ]]> <!ENTITY % noframes.content "(%flow;)*"> <!ELEMENT NOFRAMES - - %noframes.content; -- alternate content container for non frame-based rendering --> <!ATTLIST NOFRAMES %attrs; -- %coreattrs, %i18n, %events -- > <!--================ Document Head =======================================--> <!-- %head.misc; defined earlier on as "SCRIPT | STYLE | META | LINK" --> <!ENTITY % head.content "TITLE & ISINDEX? & BASE?"> <!ELEMENT HEAD O O (%head.content;) +(%head.misc;) -- document head --> <!ATTLIST HEAD %i18n; -- lang, dir -- profile %URL; #IMPLIED -- named dictionary of meta info -- > <!-- The TITLE element is not considered part of the flow of text. It should be displayed, for example as the page header or window title. Exactly one title is required per document. --> <!ELEMENT TITLE - - (#PCDATA) -(%head.misc;) -- document title --> <!ATTLIST TITLE %i18n> <!ELEMENT ISINDEX - O EMPTY -- single line prompt --> <!ATTLIST ISINDEX %coreattrs; -- id, class, style, title -- %i18n; -- lang, dir -- prompt %Text; #IMPLIED -- prompt message --> <!ELEMENT BASE - O EMPTY -- document base URL --> <!ATTLIST BASE href %URL; #IMPLIED -- URL that acts as base URL -- target %FrameTarget; #IMPLIED -- render in this frame -- > <!ELEMENT META - O EMPTY -- generic metainformation --> <!ATTLIST META %i18n; -- lang, dir, for use with content string -- http-equiv NAME #IMPLIED -- HTTP response header name -- name NAME #IMPLIED -- metainformation name -- content CDATA #REQUIRED -- associated information -- scheme CDATA #IMPLIED -- select form of content -- > <!ELEMENT STYLE - - CDATA -- style info --> <!ATTLIST STYLE %i18n; -- lang, dir, for use with title -- type %ContentType; #REQUIRED -- content type of style language -- media %MediaDesc; #IMPLIED -- designed for use with these media -- title %Text; #IMPLIED -- advisory title -- > <!ELEMENT SCRIPT - - CDATA -- script statements --> <!ATTLIST SCRIPT charset %Charset; #IMPLIED -- char encoding of linked resource -- type %ContentType; #REQUIRED -- content type of script language -- language CDATA #IMPLIED -- predefined script language name -- src %URL; #IMPLIED -- URL for an external script -- defer (defer) #IMPLIED -- UA may defer execution of script -- > <!ELEMENT NOSCRIPT - - (%flow;)* -- alternate content container for non script-based rendering --> <!ATTLIST NOSCRIPT %attrs; -- %coreattrs, %i18n, %events -- > <!--================ Document Structure ==================================--> <!ENTITY % version "version CDATA #FIXED '%HTML.Version;'"> <![ %HTML.Frameset; [ <!ENTITY % html.content "HEAD, FRAMESET"> ]]> <!ENTITY % html.content "HEAD, BODY"> <!ELEMENT HTML O O (%html.content;) -- document root element --> <!ATTLIST HTML %version; %i18n; -- lang, dir -- >Title: XML4C2 Readme
|