Benoit Tellier created JAMES-4061:
-------------------------------------
Summary: Html Text extractor needs to handle blockquote
Key: JAMES-4061
URL: https://issues.apache.org/jira/browse/JAMES-4061
Project: James Server
Issue Type: Bug
Components: JMAP
Affects Versions: master
Reporter: Benoit Tellier
Assignee: Antoine Duprat
Attachments: image-2024-08-22-14-54-37-915.png,
image-2024-08-22-14-54-51-684.png, image-2024-08-22-14-55-01-317.png
Following recent mailing list exchanges, Wojtek contacted me privatly to notice
me about the bad idents of my inlined ansers.
The exchange:
https://www.mail-archive.com/[email protected]/msg74362.html
Set up: I used Twake mail client throughout the discussion which produces html
and relies on James server JMAP code for generating the text/plain part. Wojtek
favors reading text plain when available.
Full diagnostic is taken from a private conversation:
h3. Diagnostic
I bet this is a plain text projection of the email that screwed up. HTML
version looks fine
!image-2024-08-22-14-54-37-915.png!
Which matched the output I see in my sent mails in Twake mail
!image-2024-08-22-14-54-51-684.png!
However indeed the text plain version is missing one level
!image-2024-08-22-14-55-01-317.png!
What we have
>> Your initial concern
> My initial answer
Your answer
My answer to your answer
What we should have
>>> Your initial concern
>> My initial answer
> Your answer
My answer to your answer
Where it gets annoying it is that our Webmail (
https://github.com/apache/james-project ) generates an HTML output (WYSIWYG)
and the backend then extract the text from the HTML in order to present a
text/plain view of the message and the <blockquote> tags are currently ignored.
The component converting HTML to text needs to account for these blockquotes,
actually keep track of the count of blockquotes of the curent context and
replace line breaks by the appropriate count of blockquotes
<blockquote><p>abc</p><p>def<br/>ghi<p><blockquote><p>jkl</p><p>mno<br/></p></blockquote><p>pqr</p></blockquote><p>stu</p>
Shall be replaced with
> abc
> def
> ghi
>> jkl
>> mno
> pqr
stu
The involved component is a JMAP utility of Apache James:
org.apache.james.jmap.utils.JsoupHtmlTextExtractor
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]