[ https://issues.apache.org/jira/browse/JAMES-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876718#comment-17876718 ]
Benoit Tellier commented on JAMES-4062: --------------------------------------- This afternoon I got a shot at JAMES-4062, the implementation was naive though I noted: - https://github.com/vsch/flexmark-java/issues/625 Out of memory errors when rendering many nested blockquotes. - https://github.com/vsch/flexmark-java/issues/626 stackoverflow. Was solved in James by replacing recursion by iterations on heap based stacks. Lessons learned: by rendering html as plain text there is an asymetry in the input size and output size that could be bound to deference algorithmic memory complexity. https://github.com/vsch/flexmark-java/issues/625 gives several examples of input that achieve O(N) in the input and O(n2) in the output. We would need in any cases to defend against this kind of data amplification through the use of HTML => plain text conversion. > Experiment flexmark for HTML text extraction > -------------------------------------------- > > Key: JAMES-4062 > URL: https://issues.apache.org/jira/browse/JAMES-4062 > Project: James Server > Issue Type: Improvement > Components: JMAP > Reporter: Benoit Tellier > Assignee: Antoine Duprat > Priority: Major > > JMAP code currently relies on a homegrown rendering code plugged onto an HTML > parser. > Though the code kind of works, it is not core code from ASF James and we > regularly miss some formating options and > https://issues.apache.org/jira/browse/JAMES-4061 is a good example of it. > An alternative could be to rely on a battle tested generally purposed > library, eg https://github.com/vsch/flexmark-java and > flexmark-html2md-converter as suggested privately by Wojtek. > Related code would likely handle all corner cases without us thinking about > it. > Also we could offer a JVM option for switching between it and the current > jsoup implementation, which would stay the default (the time to experiment > the flexmark option) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org