https://bugs.documentfoundation.org/show_bug.cgi?id=117428

            Bug ID: 117428
           Summary: add an option to PDF export dialog to do ActualText
                    per word
           Product: LibreOffice
           Version: 6.1.0.0.alpha1+ Master
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: medium
         Component: Printing and PDF export
          Assignee: libreoffice-bugs@lists.freedesktop.org
          Reporter: shreesh...@gmail.com

Description:
A new feature has been added to 6.1.0 by Khaled Hosny that allows text to be
copied and extracted from pdfs using ActualText. However it does not work
completely for complex scripts.

ActualText per word has been suggested as a possible solution. Khaled has
suggested that this be done via an option to PDF export dialog to do ActualText
per word rather than as a default.

Steps to Reproduce:
1.Use the following text for testing.
Devanagari Script – 
Hindi, Sanskrit, Marathi, Nepali languages
नित्यानन्दकरी वराभयकरी सौन्दर्यरत्नाकरी । निर्धूताखिलघोरपावनकरी
प्रत्यक्षमाहेश्वरी ।। 
अग्निशामक अभिज्ञान अनुक्रम काष्ठवाद्य अंतर्राष्ट्रीय ख़ूँखार मूत्रविज्ञान
द्विध्रुव 
2.Open a new .odt file in LibreOffice , copy and paste the above text.
3.Export to pdf
4.Open the pdf in Acrobat Reader
5. Copy the text and paste in a text editor
6. Compare with the original utf-8 text

Actual Results:  
Devanagari Script –
Hindi, Sanskrit, Marathi, Nepali languages
नि त्यानन्दकरी वराभयकरी सौन्दर्यरत्नाकरी । नि र्धूताखि लघोरपावनकरी
प्रत्यक्षमाहेश्वरी ।।
अग्नि शामक अभि ज्ञान अनुक्रम काष्ठवाद्य अंतर्रा ष्ट्र ीय ख़ूँखार मूत्रवि ज्ञान
द्वि ध्रुव

Expected Results:
The text should be the same as original.

Devanagari Script – 
Hindi, Sanskrit, Marathi, Nepali languages
नित्यानन्दकरी वराभयकरी सौन्दर्यरत्नाकरी । निर्धूताखिलघोरपावनकरी
प्रत्यक्षमाहेश्वरी ।। 
अग्निशामक अभिज्ञान अनुक्रम काष्ठवाद्य अंतर्राष्ट्रीय ख़ूँखार मूत्रविज्ञान
द्विध्रुव 


Reproducible: Always


User Profile Reset: No



Additional Info:
The following wdiff output shows the difference.

======================================================================

[-नित्यानन्दकरी-]
{+नि त्यानन्दकरी+}
======================================================================
 [-निर्धूताखिलघोरपावनकरी-] {+नि र्धूताखि लघोरपावनकरी+}
======================================================================

[-अग्निशामक अभिज्ञान-]
{+अग्नि शामक अभि ज्ञान+}
======================================================================
 [-अंतर्राष्ट्रीय-] {+अंतर्रा ष्ट्र ीय+}
======================================================================
 [-मूत्रविज्ञान द्विध्रुव-] {+मूत्रवि ज्ञान द्वि ध्रुव+}
======================================================================

Please see https://bugs.documentfoundation.org/attachment.cgi?id=141808 for
more examples with many other Indic/Complex scripts.

 I tested with 
> Version: 6.1.0.0.alpha1+ (x64)
> Build ID: 5f2073fbc995fb619f398a55187413813578b62e
> CPU threads: 4; OS: Windows 10.0; UI render: default; 
> TinderBox: Win-x86_64@42, Branch:master, Time: 2018-04-30_00:51:08
> Locale: en-IN (en_IN); Calc: group



User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/66.0.3359.139 Safari/537.36

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to