[Libreoffice-bugs] [Bug 152204] Many HTML entity names not supported

2022-11-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152204

--- Comment #6 from V Stuart Foote  ---
(In reply to V Stuart Foote from comment #5)
> Created attachment 183759 [details]
> LO ODF text document sample with mix of named character entity values
> 
> For named character entities added at HTML5 so without LO import filter
> handling, the unrecognized entity is left in its "&  ;" format on
> import.

Oops sorry, that actually is a HTML generated from LO 7.5. Then edited a bit to
clean up the HTML formatting to put each stanza on its own row.

So when opened into LibreOffice, from Writer Web module select the HTML source
view to see what entities are missing from filter import.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152204] Many HTML entity names not supported

2022-11-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152204

--- Comment #5 from V Stuart Foote  ---
Created attachment 183759
  --> https://bugs.documentfoundation.org/attachment.cgi?id=183759=edit
LO ODF text document sample with mix of named character entity values

For named character entities added at HTML5 so without LO import filter
handling, the unrecognized entity is left in its "&  ;" format on import.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152204] Many HTML entity names not supported

2022-11-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152204

V Stuart Foote  changed:

   What|Removed |Added

 CC||vsfo...@libreoffice.org
   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=36
   ||977
 Blocks||103302

--- Comment #4 from V Stuart Foote  ---
Confirming.

Version: 7.5.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 651658d37bcb3f493942dd5d0b9a0d65c96f105c
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

LibreOffice filter import does not handle the additional HTML / XML named
character entities added for HTML5. Not just  or  as here. The
unhandled entities are not converted to the appropriate glyph on LibreOffice
document canvas and remain plaintext.

Dante did add the handling needed for MathML support with
https://gerrit.libreoffice.org/c/core/+/108333

But something similar is needed to support Writer Web parsing the characters
from HTML/XHTML or XML.

Attached test ODF text doc shows example of the named entities not being
handled in LibreOffice Writer Web import.

=-ref-=
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=103302
[Bug 103302] [META] Writer's web layout/view bugs and enhancements
-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152204] Many HTML entity names not supported

2022-11-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152204

Stéphane Guillou (stragu)  changed:

   What|Removed |Added

Summary|Unicode HTML entity a breve |Many HTML entity names not
   |(Ă) is not rendered with|supported
   | |
  Component|LibreOffice |filters and storage

-- 
You are receiving this mail because:
You are the assignee for the bug.