[digikam] [Bug 472692] Tesseract OCR does not take language selection into account
https://bugs.kde.org/show_bug.cgi?id=472692 Maik Qualmann changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED Version Fixed In||8.2.0 Latest Commit||https://invent.kde.org/grap ||hics/digikam/-/commit/21ef0 ||f72c6af7be18c6f5ae57159e50a ||5b4c894f --- Comment #6 from Maik Qualmann --- Git commit 21ef0f72c6af7be18c6f5ae57159e50a5b4c894f by Maik Qualmann. Committed on 29/07/2023 at 19:44. Pushed by mqualmann into branch 'master'. the DBInfoIface must also write metadata via MetadataHub FIXED-IN: 8.2.0 M +1-1NEWS M +25 -4core/libs/database/utils/ifaces/dbinfoiface.cpp https://invent.kde.org/graphics/digikam/-/commit/21ef0f72c6af7be18c6f5ae57159e50a5b4c894f -- You are receiving this mail because: You are watching all bug changes.
[digikam] [Bug 472692] Tesseract OCR does not take language selection into account
https://bugs.kde.org/show_bug.cgi?id=472692 --- Comment #5 from Maik Qualmann --- Ok, encoding is fine on Windows now. We still have to fix the writing of the OCR text in the metadata. At the moment it is only written to the DB, which at the end restores the original caption text with a rescan. As with this sample image that already contains a caption text, we would overwrite it with the OCR text. Here we either have to merge or think of something else. Maik -- You are receiving this mail because: You are watching all bug changes.
[digikam] [Bug 472692] Tesseract OCR does not take language selection into account
https://bugs.kde.org/show_bug.cgi?id=472692 --- Comment #4 from Maik Qualmann --- Git commit cc42ef72e33356f66ec132e96cbb684d3c8d28bc by Maik Qualmann. Committed on 28/07/2023 at 08:08. Pushed by mqualmann into branch 'master'. according to Tesseract doc the output encoding should be UTF8 M +1-1 core/dplugins/generic/tools/ocrtextconverter/ocrtesseractengine.cpp https://invent.kde.org/graphics/digikam/-/commit/cc42ef72e33356f66ec132e96cbb684d3c8d28bc -- You are receiving this mail because: You are watching all bug changes.
[digikam] [Bug 472692] Tesseract OCR does not take language selection into account
https://bugs.kde.org/show_bug.cgi?id=472692 --- Comment #3 from Maik Qualmann --- Ok, we're a big step further, the language setting works, we get a text with German umlauts, but in the Windows codepage format and not UTF8. This is correct when we view the text file in the Windows text editor, but not in our preview. The question now is, do we want codepage or UTF8 on Windows? Maik -- You are receiving this mail because: You are watching all bug changes.
[digikam] [Bug 472692] Tesseract OCR does not take language selection into account
https://bugs.kde.org/show_bug.cgi?id=472692 --- Comment #2 from Maik Qualmann --- Git commit 5918439aafb5b2f7387490cb2abc9178fe33f374 by Maik Qualmann. Committed on 27/07/2023 at 20:49. Pushed by mqualmann into branch 'master'. fix language parameter for Tesseract OCR on Windows M +11 -0core/dplugins/generic/tools/ocrtextconverter/tesseractbinary.cpp https://invent.kde.org/graphics/digikam/-/commit/5918439aafb5b2f7387490cb2abc9178fe33f374 -- You are receiving this mail because: You are watching all bug changes.
[digikam] [Bug 472692] Tesseract OCR does not take language selection into account
https://bugs.kde.org/show_bug.cgi?id=472692 Maik Qualmann changed: What|Removed |Added CC||metzping...@gmail.com --- Comment #1 from Maik Qualmann --- I tested it here under Linux, the Windows test will follow. If I select German as the language, I get a correct text with German umlauts. Maik -- You are receiving this mail because: You are watching all bug changes.