[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #25 from Eyal Rozenberg --- Can someone summarize the state of this bug at the moment? -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Eyal Rozenberg changed: What|Removed |Added Blocks||43808, 103378 Referenced Bugs: https://bugs.documentfoundation.org/show_bug.cgi?id=43808 [Bug 43808] [META] Right-To-Left and Complex Text Layout language issues (RTL/CTL) https://bugs.documentfoundation.org/show_bug.cgi?id=103378 [Bug 103378] [META] PDF export bugs and enhancements -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 V Stuart Foote changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=15 ||2143 -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #24 from stragu --- Created attachment 173742 --> https://bugs.documentfoundation.org/attachment.cgi?id=173742=edit PDF as exported by LO 7.3 on Ubuntu 18.04 Also attaching the resulting PDF for completeness' sake. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #23 from stragu --- Created attachment 173741 --> https://bugs.documentfoundation.org/attachment.cgi?id=173741=edit results of testing on Ubuntu 18.04 with LO 7.3 alpha and Evince as PDF viewer Interesting indeed ! Here are the results of my tests using: - LO 7.3 alpha0+ - Ubuntu 18.04 - Evince 3.28.4 - gedit 3.28.1 I can't spot any difference with the original text. This makes me wonder if the issue is specific to Windows, or if Acrobat Reader is the culprit? Version: 7.3.0.0.alpha0+ / LibreOffice Community Build ID: 113d308155e4b6a67a8510098a7db5f4a6632bdc CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3 Locale: en-AU (en_AU.UTF-8); UI: en-US TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-07-16_21:27:22 Calc: threaded -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #22 from V Stuart Foote --- Created attachment 173712 --> https://bugs.documentfoundation.org/attachment.cgi?id=173712=edit Result of OP STR as pasted to Notepad++ UTF-8 -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #21 from V Stuart Foote --- (In reply to stragu from comment #20) > Not sure if something changed in PDF export along the way? Could you please > test again with a recent version of LO? Hmm, strange. With STR of OP with Writer 7.3.0alpha export to PDF. Opened in Acrobat Reader (ver 2021.005.20058) and copy to Notepad++ (bld 7.9.5) in UTF+8 encoding--I get exactly the same misformed Devanagari The glyph clusters are not formed correctly, so the words can not be copied out of the PDF. The /ActualText structures when present would supplement the incorrect ToUnicode strings that drop lexical details. Parsing the actual text runs would, if done at Unicode word bound iterators, provide better fidelity to original text when enabled and embedded into the PDF export. =-testing-= Version: 7.3.0.0.alpha0+ (x64) / LibreOffice Community Build ID: 213430e0bdac0786b30a76a68b43d35647e93912 CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Vulkan; VCL: win Locale: en-US (en_US); UI: en-US Calc: threaded -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 stragu changed: What|Removed |Added CC||stephane.guil...@member.fsf ||.org --- Comment #20 from stragu --- I just tested the steps described in the Description, and couldn't reproduce the same issue: On Ubuntu 18.04, using LO 7.0.6 and 7.3 alpha0+, I could copy the text, paste in Write, export to PDF, open in Evince 3.28.4, copy the text and paste it back in Writer or gedit: the result was the same as the original text (as far as I can see). Not sure if something changed in PDF export along the way? Could you please test again with a recent version of LO? -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 V Stuart Foote changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=39 ||667 -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 V Stuart Foote changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=11 ||8370 -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Khaled Hosny changed: What|Removed |Added See Also|https://bugs.documentfounda | |tion.org/show_bug.cgi?id=58 | |941 | -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #19 from Khaled Hosny --- (In reply to flywire0 from comment #18) > I consider libre word pdf characters displayed missing when text copied is a > serious bug. In my instance the letter 'i' is displayed in the pdf file but > often missing when text is copied and pasted to another program. eg computer > commands are pasted incorrectly. This should be fixed in big 66597, if you still have an issue with builds including that fix, please open a new bug. This should be independent of the issue being discussed here. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Khaled Hosny changed: What|Removed |Added Depends on|117533 | Referenced Bugs: https://bugs.documentfoundation.org/show_bug.cgi?id=117533 [Bug 117533] Problems with copying text from generated PDF (for Graphite font) -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #18 from flywi...@gmail.com --- I consider libre word pdf characters displayed missing when text copied is a serious bug. In my instance the letter 'i' is displayed in the pdf file but often missing when text is copied and pasted to another program. eg computer commands are pasted incorrectly. I have also noticed Text To Speech (TTS) does not work with missing characters in the pdf. Especially when it is a vowel! -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #17 from Heiko Tietze --- (In reply to Khaled Hosny from comment #16) > Nothing is “non-latin”-specific about the proposed option. How would you call CTL and alike in a way that average users understand this? IMHO, "Latin" is understood as A..Z maybe including some special characters like umlauts but definitely not arabic, hebrew, and asian. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #16 from Khaled Hosny--- (In reply to Heiko Tietze from comment #15) > Putting all comments together UX recommends to implement an option for this > /Actualtext feature. I suggest the caption "Improve non-latin text export" > (with default off, meaning nothing changes for western users) and explain > details at the help pages. Nothing is “non-latin”-specific about the proposed option. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Heiko Tietzechanged: What|Removed |Added Keywords|needsUXEval | CC|libreoffice-ux-advise@lists |olivier.hallot@documentfoun |.freedesktop.org|dation.org, ||tietze.he...@gmail.com --- Comment #15 from Heiko Tietze --- Putting all comments together UX recommends to implement an option for this /Actualtext feature. I suggest the caption "Improve non-latin text export" (with default off, meaning nothing changes for western users) and explain details at the help pages. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Bug 117428 depends on bug 117533, which changed state. Bug 117533 Summary: Problems with copying text from generated PDF (for Graphite font) https://bugs.documentfoundation.org/show_bug.cgi?id=117533 What|Removed |Added Status|NEEDINFO|RESOLVED Resolution|--- |INVALID -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #14 from Shree Devi Kumar--- (In reply to Khaled Hosny from comment #12) > (In reply to Shree Devi Kumar from comment #10) > > (In reply to Khaled Hosny from comment #9) > > > They keyword for the > > > proposed changes is “per word”, the new option would skip the algorithm > > > and > > > tags the glyphs if each word with it's text, as a complete unit. > > > > @Khaled Any update on this? Can you create a patch for this option so that > > it can be tested? > > I don’t currently have time to work on this, unfortunately. Ok. Thank you for your work on \Actualtext, it is step in the right direction to getting fully copyable text from pdfs. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #13 from Shree Devi Kumar--- (In reply to V Stuart Foote from comment #11) > I don't believe Khaled has volunteered to tackle the needed refactoring to > the PDF export filter and GUI. Check History--clearly not assigned as > Khaled removed himself, back to NEW OK. Since he had suggested about opening a new bug for this, I had incorrectly assumed that he was planning to work on it. > > Otherwise, is there any objection that implementing an /ActualText flag "per > word" will mean string selection to copy from PDF will be limited to word > bounds? Personally I think we need the tagging more than the partial string > copy. > > Assuring correct handling combining glyphs and Unicode script--and > presumably OTF font features when implemented (as for bug 58941)--is the > desired outcome. > > Justified from a11y perspective, and needed for accuracy supporting CTL > scripts. > > Is that the UX consensus? As a user the ability to copy text from pdf is important. Currently, except for xelatex, I am not aware of any other method of doing so for Devanagari and other Indic scripts. Please see https://www.wikihow.com/index.php?title=Create-a-Searchable-Hindi-PDF-Using-Lyx-with-Xetex which is a workaround for users who are not comfortable with XeLatex to create these searchable/copyable pdfs. It will be a great benefit to users if this option can be implemented in Libre Office. Thank You! -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #12 from Khaled Hosny--- (In reply to Shree Devi Kumar from comment #10) > (In reply to Khaled Hosny from comment #9) > > They keyword for the > > proposed changes is “per word”, the new option would skip the algorithm and > > tags the glyphs if each word with it's text, as a complete unit. > > @Khaled Any update on this? Can you create a patch for this option so that > it can be tested? I don’t currently have time to work on this, unfortunately. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 V Stuart Footechanged: What|Removed |Added Status|ASSIGNED|NEW See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=58 ||941 --- Comment #11 from V Stuart Foote --- I don't believe Khaled has volunteered to tackle the needed refactoring to the PDF export filter and GUI. Check History--clearly not assigned as Khaled removed himself, back to NEW Otherwise, is there any objection that implementing an /ActualText flag "per word" will mean string selection to copy from PDF will be limited to word bounds? Personally I think we need the tagging more than the partial string copy. Assuring correct handling combining glyphs and Unicode script--and presumably OTF font features when implemented (as for bug 58941)--is the desired outcome. Justified from a11y perspective, and needed for accuracy supporting CTL scripts. Is that the UX consensus? -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Shree Devi Kumarchanged: What|Removed |Added Status|NEW |ASSIGNED --- Comment #10 from Shree Devi Kumar --- (In reply to Khaled Hosny from comment #9) > > We do export the text already, but using a clever algorithm that minimizes > file size impact and keeps individual characters selectable (as much as > possible), but it fails in minor ways with some readers second guessing us > and inserting random spaces in the middle of the word. For Indic languages this was happening in ALL readers that I tested. > They keyword for the > proposed changes is “per word”, the new option would skip the algorithm and > tags the glyphs if each word with it's text, as a complete unit. @Khaled Any update on this? Can you create a patch for this option so that it can be tested? -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Volgachanged: What|Removed |Added Depends on||117533 Referenced Bugs: https://bugs.documentfoundation.org/show_bug.cgi?id=117533 [Bug 117533] Problems with copying text from generated PDF (for Graphite font) -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 --- Comment #9 from Khaled Hosny--- (In reply to Heiko Tietze from comment #8) > > (In reply to Khaled Hosny from comment #4) > > 2) What exact wording to use, /ActualText is a jargon > "Export raw text", "Export actual text", "Export source"... We do export the text already, but using a clever algorithm that minimizes file size impact and keeps individual characters selectable (as much as possible), but it fails in minor ways with some readers second guessing us and inserting random spaces in the middle of the word. They keyword for the proposed changes is “per word”, the new option would skip the algorithm and tags the glyphs if each word with it's text, as a complete unit. This fixes the issue, but introduces a new one; you can no longer select parts of the word, it is now a single unit. The option text needs to relay some of this to the user. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Khaled Hosnychanged: What|Removed |Added Status|ASSIGNED|NEW Assignee|khaledho...@eglug.org |libreoffice-b...@lists.free ||desktop.org -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 117428] add an option to PDF export dialog to do ActualText per word
https://bugs.documentfoundation.org/show_bug.cgi?id=117428 Shree Devi Kumarchanged: What|Removed |Added CC||khaledho...@eglug.org Assignee|libreoffice-b...@lists.free |khaledho...@eglug.org |desktop.org | -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs