[ https://issues.apache.org/jira/browse/ODFTOOLKIT-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743327#comment-14743327 ]
Michael Stahl commented on ODFTOOLKIT-400: ------------------------------------------ it's a bit of a mystery to me what the point of this is. the whole point of using a XML parser is so that you don't have to care about things like what charset and encoding the source document uses - it's all abstracted away and you only have to deal with nice and uniform Unicode text in your application. specifically, the Java XML parsers all create java.lang.Strings which are *always* UTF-16 encoded Unicode. so if you want to export UTF-8 encoded HTML, *just do it* by encoding the strings at the point when you write them into the generated file. > Unable to obtain the charset encoding of an odt document > -------------------------------------------------------- > > Key: ODFTOOLKIT-400 > URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-400 > Project: ODF Toolkit > Issue Type: Bug > Components: odfdom > Environment: linux - ubuntu 14.04 > Reporter: Joshua > Attachments: 400-part1-pom_xml-FromJava1_5To1_6ForStAX.patch, > 400-part2-test-OdfFileDom_xmlDeclTest.patch, > 400-part3-main-OdfFileDom_initXmlDecl.patch, testOdt.odt > > > Im trying to convert odt to html. In doing the conversion Im trying to obtain > the charset encoding of the odt document so that I can set the appropriate > value on the html end. However I always get a 'null' value when trying to > read the charset. > {code} > OdfTextDocument odfDoc = OdfTextDocument.loadDocument(is) > System.out.println(odfDoc.getContentDom.getXmlEncoding) > {code} > For the test document attached I am expecting to get UTF-8 but always see > 'null'. Happens on other docs as well, > Is there a better way to obtain the charset encoding of an odt document? -- This message was sent by Atlassian JIRA (v6.3.4#6332)