Hi Dominik

Sure I attached the symbol_test.doc document in the previous email.

I think I cannot attach the document in email?

Is there anyway I can share the document?


Thanks

T.

On 06/10/2019 16:29, Dominik Stadler wrote:
Hi,

can you share an example document which shows the behavior?

Thanks... Dominik.


On Sun, Oct 6, 2019 at 6:48 AM Teresa Kim
<teresa....@linguamatics.com.invalid> wrote:

Hi


I have documents (either 'doc' or 'docx') that have a special character
for 'greater than equal' and using codes in 'WordToHtmlConverter', I see
those characters are converted into '('.

I tried with the latest apache poi release 4.1.0.


My java code is:


public class TestWordtoHtmlConverter {

      public static void main(String[] args ) {
          try {
          HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new
FileInputStream(args[0]));

          WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                  DocumentBuilderFactory.newInstance().newDocumentBuilder()
                          .newDocument());

          wordToHtmlConverter.processDocument(wordDocument);
          Document htmlDocument = wordToHtmlConverter.getDocument();
          ByteArrayOutputStream out = new ByteArrayOutputStream();
          DOMSource domSource = new DOMSource(htmlDocument);
          StreamResult streamResult = new StreamResult(out);

          TransformerFactory tf = TransformerFactory.newInstance();
          Transformer serializer = tf.newTransformer();
          serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
          serializer.setOutputProperty(OutputKeys.INDENT, "yes");
          serializer.setOutputProperty(OutputKeys.METHOD, "html");
          serializer.transform(domSource, streamResult);
          out.close();

          String result = new String(out.toByteArray());
          System.out.println(result);
        } catch (Exception e) {
        }

Is there anyway I can correctly identify these symbols?


In the sample document, I am interested in getting 'bad one'.


Thanks

T.





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
For additional commands, e-mail: user-h...@poi.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
For additional commands, e-mail: user-h...@poi.apache.org

Reply via email to