[ https://issues.apache.org/jira/browse/TIKA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253511#comment-17253511 ]
Nick Burch commented on TIKA-3254: ---------------------------------- Tika tries to give you clean, semantically meaningful XHTML. It deliberately doesn't give you the "Word -> Save As -> HTML" fully-featured-mess... > Html font styles missing - doc to html > -------------------------------------- > > Key: TIKA-3254 > URL: https://issues.apache.org/jira/browse/TIKA-3254 > Project: Tika > Issue Type: Bug > Reporter: Sathia > Priority: Major > Attachments: Sample.doc > > > Hi Team, > I tried using convert doc to xhtml using tika. the conversation is successful > but styles missing. > > Attached *sample.doc* which I used. the below code I have used for > conversation. > > {{public}} {{String parseToHTML() }}{{throws}} {{IOException, SAXException, > TikaException {}} > {{ }}{{ContentHandler handler = }}{{new}} {{ToXMLContentHandler();}} > > {{ }}{{AutoDetectParser parser = }}{{new}} {{AutoDetectParser();}} > {{ }}{{Metadata metadata = }}{{new}} {{Metadata();}} > {{ }}{{try}} {{(InputStream stream = > ContentHandlerExample.}}{{class}}{{.getResourceAsStream(}}{{"test.doc"}}{{)) > {}} > {{ }}{{parser.parse(stream, handler, metadata);}} > {{ }}{{return}} {{handler.toString();}} > {{ }}{{}}} > {{}}} > > Regards, > Sathia -- This message was sent by Atlassian Jira (v8.3.4#803005)