Hi, can you share an example document which shows the behavior?
Thanks... Dominik. On Sun, Oct 6, 2019 at 6:48 AM Teresa Kim <teresa....@linguamatics.com.invalid> wrote: > Hi > > > I have documents (either 'doc' or 'docx') that have a special character > for 'greater than equal' and using codes in 'WordToHtmlConverter', I see > those characters are converted into '('. > > I tried with the latest apache poi release 4.1.0. > > > My java code is: > > > public class TestWordtoHtmlConverter { > > public static void main(String[] args ) { > try { > HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new > FileInputStream(args[0])); > > WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter( > DocumentBuilderFactory.newInstance().newDocumentBuilder() > .newDocument()); > > wordToHtmlConverter.processDocument(wordDocument); > Document htmlDocument = wordToHtmlConverter.getDocument(); > ByteArrayOutputStream out = new ByteArrayOutputStream(); > DOMSource domSource = new DOMSource(htmlDocument); > StreamResult streamResult = new StreamResult(out); > > TransformerFactory tf = TransformerFactory.newInstance(); > Transformer serializer = tf.newTransformer(); > serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); > serializer.setOutputProperty(OutputKeys.INDENT, "yes"); > serializer.setOutputProperty(OutputKeys.METHOD, "html"); > serializer.transform(domSource, streamResult); > out.close(); > > String result = new String(out.toByteArray()); > System.out.println(result); > } catch (Exception e) { > } > > Is there anyway I can correctly identify these symbols? > > > In the sample document, I am interested in getting 'bad one'. > > > Thanks > > T. > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@poi.apache.org > For additional commands, e-mail: user-h...@poi.apache.org