Lazy XHTML prefix generation
----------------------------

                 Key: TIKA-131
                 URL: https://issues.apache.org/jira/browse/TIKA-131
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting
            Priority: Minor


The XHTMLContentHandler utility class is used by many Tika parsers to generate 
XHTML output. Among other things, the XHTMLContentHandler automatically 
generates the following XHTML skeleton:

    <html xmlns="http://www.w3.org/1999/xhtml";>
      <head>
        <title>...</title>
      </head>
      <body>
        ...
      </body>
    </html>

The <title/> tag (and potentially other metadata in future) is based on the 
Metadata.TITLE property of the document being parsed. Unfortunately that 
metadata is often not yet available when the XHTML generation is started, as a 
typical usage pattern is:

    XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
    xhtml.startDocument();
    // parse the document
    xhtml.endDocument();

We can avoid the problem in many cases by postponing the XHTML prefix 
generation to when the parser actually starts to produce some SAX events.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to