[ 
https://issues.apache.org/jira/browse/TIKA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253511#comment-17253511
 ] 

Nick Burch commented on TIKA-3254:
----------------------------------

Tika tries to give you clean, semantically meaningful XHTML. 

It deliberately doesn't give you the "Word -> Save As -> HTML" 
fully-featured-mess...

> Html font styles missing - doc to html
> --------------------------------------
>
>                 Key: TIKA-3254
>                 URL: https://issues.apache.org/jira/browse/TIKA-3254
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Sathia
>            Priority: Major
>         Attachments: Sample.doc
>
>
> Hi Team,
> I tried using convert doc to xhtml using tika. the conversation is successful 
> but styles missing. 
>  
> Attached *sample.doc* which I used. the below code I have used for 
> conversation.
>  
> {{public}} {{String parseToHTML() }}{{throws}} {{IOException, SAXException, 
> TikaException {}}
> {{    }}{{ContentHandler handler = }}{{new}} {{ToXMLContentHandler();}}
>  
> {{    }}{{AutoDetectParser parser = }}{{new}} {{AutoDetectParser();}}
> {{    }}{{Metadata metadata = }}{{new}} {{Metadata();}}
> {{    }}{{try}} {{(InputStream stream = 
> ContentHandlerExample.}}{{class}}{{.getResourceAsStream(}}{{"test.doc"}}{{)) 
> {}}
> {{        }}{{parser.parse(stream, handler, metadata);}}
> {{        }}{{return}} {{handler.toString();}}
> {{    }}{{}}}
> {{}}}
>  
> Regards,
> Sathia



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to