Benoit Tellier created JAMES-4062:
-------------------------------------

             Summary: Experiment flexmark for HTML text extraction
                 Key: JAMES-4062
                 URL: https://issues.apache.org/jira/browse/JAMES-4062
             Project: James Server
          Issue Type: Improvement
          Components: JMAP
            Reporter: Benoit Tellier
            Assignee: Antoine Duprat


JMAP code currently relies on a homegrown rendering code plugged onto an HTML 
parser.

Though the code kind of works, it is not core code from ASF James and we 
regularly miss some formating options and 
https://issues.apache.org/jira/browse/JAMES-4061 is a good example of it.

An alternative could be to rely on a battle tested generally purposed library, 
eg https://github.com/vsch/flexmark-java and flexmark-html2md-converter as 
suggested privately by Wojtek.

Related code would likely handle all corner cases without us thinking about it.

Also we could offer a JVM option for switching between it and the current jsoup 
implementation, which would stay the default (the time to experiment the 
flexmark option)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to