Benoit Tellier created JAMES-4061:
-------------------------------------

             Summary: Html Text extractor needs to handle blockquote
                 Key: JAMES-4061
                 URL: https://issues.apache.org/jira/browse/JAMES-4061
             Project: James Server
          Issue Type: Bug
          Components: JMAP
    Affects Versions: master
            Reporter: Benoit Tellier
            Assignee: Antoine Duprat
         Attachments: image-2024-08-22-14-54-37-915.png, 
image-2024-08-22-14-54-51-684.png, image-2024-08-22-14-55-01-317.png

Following recent mailing list exchanges, Wojtek contacted me privatly to notice 
me about the bad idents of my inlined ansers.

The exchange: 
https://www.mail-archive.com/server-dev@james.apache.org/msg74362.html

Set up: I used Twake mail client throughout the discussion which produces html 
and relies on James server JMAP code for generating the text/plain part. Wojtek 
favors reading text plain when available.

Full diagnostic is taken from a private conversation:

h3. Diagnostic

I bet this is a plain text projection of the email that screwed up. HTML 
version looks fine

 !image-2024-08-22-14-54-37-915.png! 

Which matched the output I see in my sent mails in Twake mail

 !image-2024-08-22-14-54-51-684.png! 

However indeed the text plain version is missing one level

 !image-2024-08-22-14-55-01-317.png! 

What we have

>> Your initial concern
> My initial answer
Your answer
My answer to your answer

What we should have

>>> Your initial concern
>> My initial answer
> Your answer
My answer to your answer

Where it gets annoying it is that our Webmail ( 
https://github.com/apache/james-project ) generates an HTML output (WYSIWYG) 
and the backend then extract the text from the HTML in order to present a 
text/plain view of the message and the <blockquote> tags are currently ignored.

The component converting HTML to text needs to account for these blockquotes, 
actually keep track of the count of blockquotes of the curent context and 
replace line breaks by the appropriate count of blockquotes

<blockquote><p>abc</p><p>def<br/>ghi<p><blockquote><p>jkl</p><p>mno<br/></p></blockquote><p>pqr</p></blockquote><p>stu</p>

Shall be replaced with

> abc
> def
> ghi
>> jkl
>> mno
> pqr
stu

The involved component is a JMAP utility of Apache James: 
org.apache.james.jmap.utils.JsoupHtmlTextExtractor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to