[jira] [Commented] (ODFTOOLKIT-400) Unable to obtain the charset encoding of an odt document

Michael Stahl (JIRA) Mon, 14 Sep 2015 03:17:25 -0700

    [ 
https://issues.apache.org/jira/browse/ODFTOOLKIT-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743327#comment-14743327
 ]


Michael Stahl commented on ODFTOOLKIT-400:
------------------------------------------

it's a bit of a mystery to me what the point of this is.

the whole point of using a XML parser is so that you don't have to care about 
things like what charset and encoding the source document uses - it's all 
abstracted away and you only have to deal with nice and uniform Unicode text in 
your application.

specifically, the Java XML parsers all create java.lang.Strings which are 
*always* UTF-16 encoded Unicode.

so if you want to export UTF-8 encoded HTML, *just do it* by encoding the 
strings at the point when you write them into the generated file.


> Unable to obtain the charset encoding of an odt document
> --------------------------------------------------------
>
>                 Key: ODFTOOLKIT-400
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-400
>             Project: ODF Toolkit
>          Issue Type: Bug
>          Components: odfdom
>         Environment: linux - ubuntu 14.04
>            Reporter: Joshua
>         Attachments: 400-part1-pom_xml-FromJava1_5To1_6ForStAX.patch, 
> 400-part2-test-OdfFileDom_xmlDeclTest.patch, 
> 400-part3-main-OdfFileDom_initXmlDecl.patch, testOdt.odt
>
>
> Im trying to convert odt to html. In doing the conversion Im trying to obtain 
> the charset encoding of the odt document so that I can set the appropriate 
> value on the html end. However I always get a 'null' value when trying to 
> read the charset.
> {code}
>         OdfTextDocument odfDoc = OdfTextDocument.loadDocument(is)
>         System.out.println(odfDoc.getContentDom.getXmlEncoding)
> {code}
> For the test document attached I am expecting to get UTF-8 but always see 
> 'null'. Happens on other docs as well,
> Is there a better way to obtain the charset encoding of an odt document?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ODFTOOLKIT-400) Unable to obtain the charset encoding of an odt document

Reply via email to