[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 V Stuart Foote changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=12 ||4191 -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 Khaled Hosny changed: What|Removed |Added Status|NEW |RESOLVED Blocks|66597 | Resolution|--- |DUPLICATE --- Comment #57 from Khaled Hosny --- We have one common code path for Graphite and non-Graphite fonts now, so whatever the fix for bug 66597 it should work here too. *** This bug has been marked as a duplicate of bug 66597 *** Referenced Bugs: https://bugs.documentfoundation.org/show_bug.cgi?id=66597 [Bug 66597] Problems with copying and extracting text from generated PDF -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #56 from martin_hos...@sil.org --- (In reply to shreeshrii from comment #54) > The problem of copying text from pdfs created with unicode fonts for complex > scripts has been solved by Jonathan Kew by use of actualtext in xelatex. > > > It uses the new \XeTeXgenerateactualtext feature - please see > http://tug.org/pipermail/xetex/2016-February/026445.html for the > announcement. > > Is it possible to use a similar approach for Libre Office? No. XeTeX is XeTeX and libo, libo. They are completely different animals with completely different processing engines, pdf output mechanisms. There is no overlap. All XeTeX is doing is inserting \actualText elements just as I suggested a while back (see comment #48). This will require some programming from someone who has the time to do it. Either that or you can pay one of the consulting companies to do it. Since this is a new feature, no amount of complaining or trying to say it's a regression on some font or other is going to fix it. The only way forward on this bug is for someone to commit code to add the capability to libo. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #55 from shreesh...@gmail.com --- Please also see https://bugs.documentfoundation.org/show_bug.cgi?id=66597#c20 Comment # 20 on bug 66597 from Khaled Hosny LibreOfice has limited support for actual text already and I think it shouldn’t be hard to extend it and make it an option at least. If someone is interested in giving this a try, check SetActualText() calls in sw/source/core/text/EnhancedPDFExportHelper.cxx. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #54 from shreesh...@gmail.com --- The problem of copying text from pdfs created with unicode fonts for complex scripts has been solved by Jonathan Kew by use of actualtext in xelatex. It uses the new \XeTeXgenerateactualtext feature - please see http://tug.org/pipermail/xetex/2016-February/026445.html for the announcement. Is it possible to use a similar approach for Libre Office? -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #53 from Jonathan --- Thanks for the update martin_hos...@sil.org. Personally I concur with the previous comment in that I don't have a strong preference. Neither space nor time is a constraint, but having a searchable PDF is essential. Perhaps if it came to it, getting the PDF right is more important than speed, so I'd go with the slow and small option. I might repeat that by manually editing my PDF (I forget how I did it, this was years ago) I managed to fix the glyph mapping and make it correctly searchable. I'm not sure what this says about the time/space trade-off you mention, but to my naive interpretation, it does make the current implementation look more like a bug than a design flaw. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #52 from Volga --- (In reply to martin_hosken from comment #51) > One of the difficulties with attaching text to a PDF text run is that the > text has to be output before the glyphs that give the presentation. So there > are a number of tradeoffs we can employ in resolving this. So I'll ask, > which you prefer: No, I have no prefer when I report here. I just reproduced by clicking “Expert to PDF” at toolbar. Sorry. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #51 from martin_hos...@sil.org --- Sorry to be somewhat brutal. But until we get the PDF writer to produce the necessary PDF to allow for data extraction, using tagged PDF, it doesn't matter what magic we do with our fonts, it isn't going to work. You can give example after example, it won't help fix the problem. One of the difficulties with attaching text to a PDF text run is that the text has to be output before the glyphs that give the presentation. So there are a number of tradeoffs we can employ in resolving this. So I'll ask, which you prefer: speed vs size? Do you want to make small PDFs that only output unicode strings for runs that really need them, but take a bit longer to produce (since the strings have to be analysed to make the decision) or do you OK with having a complete copy of the text in your pdf? Do we want to make this an option that says: make me extractable PDF or do we always want to generate extractable PDF even if the result is bigger or slower to produce? -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #50 from Volga --- Created attachment 136058 --> https://bugs.documentfoundation.org/attachment.cgi?id=136058&action=edit Problem with Cyrillic The problem still appearing with Cyrillic. I installed Ponomar Unicode and its TTF version (Ponomar Unicode TT) on my computer, and I copied a sample text from http://sci.ponomar.net/fonts.html twice, set to these fonts. After I expert to PDF, copy the text, I get the following result: Ponomar Unicode Хрⷭ ҇ то́ съ воскре́ се и҆ з̾ ме́ ртвыхъ, сме́ ртїю сме́ рть попра́ въ, и҆ сꙋ́ щымъ во гробѣхъ иво́ тъ дарова́ въ. Ponomar Unicode TT Хртоосъ воскреосе иизз еортвыхъ, с еортїю с еорть попраовъ, ии сꙋ о щы ъ во гробѣхъ живоотъ дароваовъ. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #49 from Volga --- Created attachment 136057 --> https://bugs.documentfoundation.org/attachment.cgi?id=136057&action=edit Awami Nastaliq Type Sample generated by LibreOffice 5.4 I have already got the font package from SIL (noted in comment 46), then I extract the sample ODF, open with LibreOffice 5.4.1, expert as PDF. When I get the PDF file, I copy the Urdu UDHR again, the character mapping seems better, but many words are deformed and not correctly handling its direction. 版本:5.4.1.2 (x64) Build ID:ea7cb86e6eeb2bf3a5af73a8fac570321527 CPU 线程:4; 操作系统:Windows 6.19; UI 渲染:默认; 区域语言:zh-CN (zh_CN); Calc: group Here is what I copied from self generated PDF. Awami Running Text One paragraph from Urdu UDHR ے ن ن ی ل ب ڋ مس نرل ا ن ی جڋ ک حڋک ه ت ق م قِوما ق ا ۰۱ ؍دسمڋنر ۸۴۹۱ با۔ ي ا اعلان عا مکک ک ے اس ک ک ږ ک ک ظوک ر ن ن ن ور" م ش ش ن ن م يم ل ا اع ک قوک ق ق ح ین ن اس ن ن ا " ا ک ء ک بږ زور الك پ ما ممڋنر مم مت ق ے ي پ پ ن ے ا ن ن ی ل ب ڋ مس عڋ ا ب ے ڋ ک ک ے اب م ن را ک خي ک ن ی ي بار ق ے۔ سا ہ ن درج ن ت ق ل م م ک م ا ک ور ک ش ش ن م بږ سا ات پ ح ف ن ص ے ل گ کا باں ي تما ن ے یہ ککہ ا س ي ً بلا ش نن۔ م ي ہ ل ّ ص ح نت ب م ق عا ش ش شر و ا ن ن ی ک ین اور اس ک ي ږ ک عام ک ِ ا اعلان ک ہاں اس ک ہ ے ي پ پ ن ے ا ي پ پ ھي ا ب ڋ با ککہ هو ي د ی ک ے اور اس ک ن ◌ ا ج ڋ اب ي اب ن س ږ ک ک ھږ ڋ ب ے پ تن ا س ي مي اداروں م ی ي ل ع ب ق ولوں اور ک ک ش بږ ا اص طور پ ج ن ے۔ اور ن ◌ ا ج ڋ اب ي ږا ںکک ن ب ي بږ وآ قامات پ ق م اب ق یہ ڋبږ ن باز ي ت ق م نی ا ◌ و ک ک ے اظ س ح ل ے ک ک ب ق ثيب ش یي ح بايس س ی ک ک ے ق ق با علا ي ملك ي س ک ک نت ي ن م ن م ض ن نن، اور سا ي ن ◌ ا ج ڋ ی ک ضج ک ن بلات او ي ص ف ن ت ق ے ن ◌ ا ج ڋ -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #48 from martin_hos...@sil.org --- I lied. It's not producing good text, even if it is somewhat Arabic like. For a start the text seems to be backwards. Here's what is going on. Inside the PDF there is a 1:n mapping between glyphs and characters. That's destined for failure just there because if you break off your nuqtas, you are in for trouble. So, while libo does the best it can, the results are going to be really bad regardless. This has nothing to do with graphite vs harfbuzz, since by the time the pdf writing is happening, everything has been shaped into the same structures. It's just the nature of the problem that PDF cannot map n:1 glyphs:chars on output, especially for the case [xy]:z and x:w. The only way to do this properly is to output the unicode text along with the glyphed text as part of the PDF page stream. One way might be in vcl/source/gdi/pdf_impl.cxx to have another MARK() function that takes a OUString&, nIndex and nLen and outputs that as the /ActualText as part of the structure element dictionary in the /Span. This would only get output if structured marking was turned on. I'm not sure if there would need to be any other limiting factors like: the text contains CTL codepoints. Suffice it to say that libo isn't up to handling CTL text for text export from PDF. But let's not blame libo too much. This is really a bug in PDF since the PDF specification only allows 1:n glyph:char mapping. All very latin centric ;) -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #47 from martin_hos...@sil.org --- Looks like this is fixed in 5.4. I ran a test and for the 3 fonts: NotoNastaliqUrdu, Awami Nastaliq and Scheherazade, the PDF copied arabic text (even with correct characters with nuqtas). Which is all pretty amazing given the Awami font doesn't have appropriately named glyphs and also decomposes its nuqtas. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #46 from Volga --- This bug still affect LO 5.3. SIL Awami Nastaliq website has a font type sample , this sample produced with LibreOffice 5.3.1.2, when I open the file, copy Urdu text from page 5, I get the following text: Awami Running Text One paragraph from Urdu UDHR À à ¢ Õ Œ œ ö – — ý “ ” ¢ ‘ ö Õ ’ ÷ ô ◊ ÿ ∞ Ÿ / " ý ⁄ ¤ ∞ ý ~ “ ” ö ‹ — õ áÇÜ ï x› œ û fi ÷ "› fl ! ‡ · fl ý › ‚ ÷ òý À „ ÷ ∆ ‰ ÷ ó Â Ê ¢ Á ¢ Ÿ Ë ó Â È Í Î ¢ Ï ù Ì Ó › fl › ‚ ÷  ± ∞ Ô Õ ¢ › Ò Ú ¢ ý Ë › ‚ ÷ ó Û ∆ « Ù ́ ™ › ı ˆ “ ” ö ‹ ß "› ı ̃ ∞ À † Ù ̄ ¢ ý À à ¢ Õ Œ œ ö – — ý ô ̆ ̇ ö À „ ÷ À ̊ › ú ¢ ó › ‚ ÷ ù ̧ ¢ ̋ û ó › ú ∞ òý xÀ ̨ øóõ ° ¢ ∞ ® ı ÷ › ‚ ÷ ó Â È Í Î ¢ Ï òý ∆ « Ù ú › ◊ ¢ À ÷ ý p› ú û › ı ̃ ¢ À ý ÷ û 4 ‡ · œ Í x° £ û . ° û ƒ ∞ › Í ý “ Í Ú ¢ Õ ’ ÷ òý ó ý ° û ∆ ‰ ÷ "› fl / ! ‡ · fl ý › ‚ ÷ òý p› À † Ù ̄ ¢ ý À † Ù ̄ ¢ ý ù ö ÷ › ú û õ Õ ’ ÷ òý ó ý À à + › ö › ú û › œ ¢ ∆ ‰ ÷ m∆ õ « Ù À ý ° û p óýõý ù Ì û ̆ ̇ ∞ ó ý p⁄ ⁄ ÷ ý ∆ « Ù ó  › ¢ ó ý xÀ à + › ö › œ û fi ÷ p ý ∆ ¢ « û ∆ « Ù ú › › ∞ ! À à + › ö › ú ∞ ∆ « ö ¢ Û› œ û " ∞ Ï ý Õ + ⁄ # ÷ À › ◊ $ À „ ÷ ƒ ∞ ≈ û % Í & û ' ù ( › œ û Õ ’ ÷ À ) ∞ ‡ · fl › ú û ́ © ù * + ÷ ° û ° ¢ , - ¢ òý ó ý ° £ û § + › ö Õ ’ ÷ ¡ . ¢ ý ú ‡ · œ û / 0 ¢ 1 ∞ This document is directly available in http://software.sil.org/awami/design/ , also available in their download page http://software.sil.org/awami/download/ -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #45 from Vera --- I can confirm that the bug is present in LibreOffice 5.2.2.2 in Ubuntu 16.04. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #44 from Jonathan --- I can confirm that the bug is present and the behaviour unchanged in version 5.2.0.4 (Debian build ID 1:5.2.0-2) which is the version installed on my work notebook. I am away from my development machine and unable to test a more recent or upstream version for another 2 weeks. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 --- Comment #43 from QA Administrators --- ** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.1.5 or 5.2.1 https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug-20160920 -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)
https://bugs.documentfoundation.org/show_bug.cgi?id=62846 Gerry changed: What|Removed |Added Summary|Incorrect glyph to Unicode |Incorrect glyph to Unicode |mappings in PDFs|mappings in PDFs (Graphite) --- Comment #42 from Gerry --- (In reply to László Németh from comment #41) > @Martin, @Gerry: Many thanks for the tests. I think, it's possible to close > this issue now, thanks to Martin's LibreOffice fix, and I will fix the > Graphite font problem with numbers in the next Linux Libertine/Biolinum G > release in the near future. @László: I just wanted to ask you when you plan to update the Linux Libertine/Biolinum G fonts to fix the wrong glyph mapping in the PDF output. Shall the bug be closed already now or after the new font versions are out? Thanks! -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs