https://bugs.documentfoundation.org/show_bug.cgi?id=117428
Bug ID: 117428
Summary: add an option to PDF export dialog to do ActualText
per word
Product: LibreOffice
Version: 6.1.0.0.alpha1+ Master
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: enhancement
Priority: medium
Component: Printing and PDF export
Assignee: libreoffice-bugs@lists.freedesktop.org
Reporter: shreesh...@gmail.com
Description:
A new feature has been added to 6.1.0 by Khaled Hosny that allows text to be
copied and extracted from pdfs using ActualText. However it does not work
completely for complex scripts.
ActualText per word has been suggested as a possible solution. Khaled has
suggested that this be done via an option to PDF export dialog to do ActualText
per word rather than as a default.
Steps to Reproduce:
1.Use the following text for testing.
Devanagari Script –
Hindi, Sanskrit, Marathi, Nepali languages
नित्यानन्दकरी वराभयकरी सौन्दर्यरत्नाकरी । निर्धूताखिलघोरपावनकरी
प्रत्यक्षमाहेश्वरी ।।
अग्निशामक अभिज्ञान अनुक्रम काष्ठवाद्य अंतर्राष्ट्रीय ख़ूँखार मूत्रविज्ञान
द्विध्रुव
2.Open a new .odt file in LibreOffice , copy and paste the above text.
3.Export to pdf
4.Open the pdf in Acrobat Reader
5. Copy the text and paste in a text editor
6. Compare with the original utf-8 text
Actual Results:
Devanagari Script –
Hindi, Sanskrit, Marathi, Nepali languages
नि त्यानन्दकरी वराभयकरी सौन्दर्यरत्नाकरी । नि र्धूताखि लघोरपावनकरी
प्रत्यक्षमाहेश्वरी ।।
अग्नि शामक अभि ज्ञान अनुक्रम काष्ठवाद्य अंतर्रा ष्ट्र ीय ख़ूँखार मूत्रवि ज्ञान
द्वि ध्रुव
Expected Results:
The text should be the same as original.
Devanagari Script –
Hindi, Sanskrit, Marathi, Nepali languages
नित्यानन्दकरी वराभयकरी सौन्दर्यरत्नाकरी । निर्धूताखिलघोरपावनकरी
प्रत्यक्षमाहेश्वरी ।।
अग्निशामक अभिज्ञान अनुक्रम काष्ठवाद्य अंतर्राष्ट्रीय ख़ूँखार मूत्रविज्ञान
द्विध्रुव
Reproducible: Always
User Profile Reset: No
Additional Info:
The following wdiff output shows the difference.
======================================================================
[-नित्यानन्दकरी-]
{+नि त्यानन्दकरी+}
======================================================================
[-निर्धूताखिलघोरपावनकरी-] {+नि र्धूताखि लघोरपावनकरी+}
======================================================================
[-अग्निशामक अभिज्ञान-]
{+अग्नि शामक अभि ज्ञान+}
======================================================================
[-अंतर्राष्ट्रीय-] {+अंतर्रा ष्ट्र ीय+}
======================================================================
[-मूत्रविज्ञान द्विध्रुव-] {+मूत्रवि ज्ञान द्वि ध्रुव+}
======================================================================
Please see https://bugs.documentfoundation.org/attachment.cgi?id=141808 for
more examples with many other Indic/Complex scripts.
I tested with
> Version: 6.1.0.0.alpha1+ (x64)
> Build ID: 5f2073fbc995fb619f398a55187413813578b62e
> CPU threads: 4; OS: Windows 10.0; UI render: default;
> TinderBox: Win-x86_64@42, Branch:master, Time: 2018-04-30_00:51:08
> Locale: en-IN (en_IN); Calc: group
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/66.0.3359.139 Safari/537.36
--
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs