[
https://issues.apache.org/jira/browse/ODFTOOLKIT-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743327#comment-14743327
]
Michael Stahl commented on ODFTOOLKIT-400:
------------------------------------------
it's a bit of a mystery to me what the point of this is.
the whole point of using a XML parser is so that you don't have to care about
things like what charset and encoding the source document uses - it's all
abstracted away and you only have to deal with nice and uniform Unicode text in
your application.
specifically, the Java XML parsers all create java.lang.Strings which are
*always* UTF-16 encoded Unicode.
so if you want to export UTF-8 encoded HTML, *just do it* by encoding the
strings at the point when you write them into the generated file.
> Unable to obtain the charset encoding of an odt document
> --------------------------------------------------------
>
> Key: ODFTOOLKIT-400
> URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-400
> Project: ODF Toolkit
> Issue Type: Bug
> Components: odfdom
> Environment: linux - ubuntu 14.04
> Reporter: Joshua
> Attachments: 400-part1-pom_xml-FromJava1_5To1_6ForStAX.patch,
> 400-part2-test-OdfFileDom_xmlDeclTest.patch,
> 400-part3-main-OdfFileDom_initXmlDecl.patch, testOdt.odt
>
>
> Im trying to convert odt to html. In doing the conversion Im trying to obtain
> the charset encoding of the odt document so that I can set the appropriate
> value on the html end. However I always get a 'null' value when trying to
> read the charset.
> {code}
> OdfTextDocument odfDoc = OdfTextDocument.loadDocument(is)
> System.out.println(odfDoc.getContentDom.getXmlEncoding)
> {code}
> For the test document attached I am expecting to get UTF-8 but always see
> 'null'. Happens on other docs as well,
> Is there a better way to obtain the charset encoding of an odt document?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)