[ https://issues.apache.org/jira/browse/TIKA-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763976#action_12763976 ]
Benson Margulies commented on TIKA-303: --------------------------------------- Feed in any HTML page that already has a title. First the regular startDocument will be called, then the document's html/head/title will be produced. Then lazyStartDocument will add another layer. You get <html> <head> <title>title</title> </head> <body> <html> <head><title>...</title></head><body> the body </body> </htm> </body> </html> I'll attach a code example later on. > XHTMLContentHandler mishandles headers > -------------------------------------- > > Key: TIKA-303 > URL: https://issues.apache.org/jira/browse/TIKA-303 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4 > Reporter: Benson Margulies > > XHTMLContentHandler.startDocument does not note that it has been called. So > then lazyStartDocument will happen and embed an extra layer of > head/title/body processing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.