[digikam] [Bug 472692] Tesseract OCR does not take language selection into account

2023-07-29 Thread Maik Qualmann
https://bugs.kde.org/show_bug.cgi?id=472692

Maik Qualmann  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REPORTED|RESOLVED
   Version Fixed In||8.2.0
  Latest Commit||https://invent.kde.org/grap
   ||hics/digikam/-/commit/21ef0
   ||f72c6af7be18c6f5ae57159e50a
   ||5b4c894f

--- Comment #6 from Maik Qualmann  ---
Git commit 21ef0f72c6af7be18c6f5ae57159e50a5b4c894f by Maik Qualmann.
Committed on 29/07/2023 at 19:44.
Pushed by mqualmann into branch 'master'.

the DBInfoIface must also write metadata via MetadataHub
FIXED-IN: 8.2.0

M  +1-1NEWS
M  +25   -4core/libs/database/utils/ifaces/dbinfoiface.cpp

https://invent.kde.org/graphics/digikam/-/commit/21ef0f72c6af7be18c6f5ae57159e50a5b4c894f

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 472692] Tesseract OCR does not take language selection into account

2023-07-28 Thread Maik Qualmann
https://bugs.kde.org/show_bug.cgi?id=472692

--- Comment #5 from Maik Qualmann  ---
Ok, encoding is fine on Windows now. We still have to fix the writing of the
OCR text in the metadata. At the moment it is only written to the DB, which at
the end restores the original caption text with a rescan. As with this sample
image that already contains a caption text, we would overwrite it with the OCR
text. Here we either have to merge or think of something else.

Maik

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 472692] Tesseract OCR does not take language selection into account

2023-07-28 Thread Maik Qualmann
https://bugs.kde.org/show_bug.cgi?id=472692

--- Comment #4 from Maik Qualmann  ---
Git commit cc42ef72e33356f66ec132e96cbb684d3c8d28bc by Maik Qualmann.
Committed on 28/07/2023 at 08:08.
Pushed by mqualmann into branch 'master'.

according to Tesseract doc the output encoding should be UTF8

M  +1-1   
core/dplugins/generic/tools/ocrtextconverter/ocrtesseractengine.cpp

https://invent.kde.org/graphics/digikam/-/commit/cc42ef72e33356f66ec132e96cbb684d3c8d28bc

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 472692] Tesseract OCR does not take language selection into account

2023-07-27 Thread Maik Qualmann
https://bugs.kde.org/show_bug.cgi?id=472692

--- Comment #3 from Maik Qualmann  ---
Ok, we're a big step further, the language setting works, we get a text with
German umlauts, but in the Windows codepage format and not UTF8. This is
correct when we view the text file in the Windows text editor, but not in our
preview.
The question now is, do we want codepage or UTF8 on Windows?

Maik

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 472692] Tesseract OCR does not take language selection into account

2023-07-27 Thread Maik Qualmann
https://bugs.kde.org/show_bug.cgi?id=472692

--- Comment #2 from Maik Qualmann  ---
Git commit 5918439aafb5b2f7387490cb2abc9178fe33f374 by Maik Qualmann.
Committed on 27/07/2023 at 20:49.
Pushed by mqualmann into branch 'master'.

fix language parameter for Tesseract OCR on Windows

M  +11   -0core/dplugins/generic/tools/ocrtextconverter/tesseractbinary.cpp

https://invent.kde.org/graphics/digikam/-/commit/5918439aafb5b2f7387490cb2abc9178fe33f374

-- 
You are receiving this mail because:
You are watching all bug changes.

[digikam] [Bug 472692] Tesseract OCR does not take language selection into account

2023-07-27 Thread Maik Qualmann
https://bugs.kde.org/show_bug.cgi?id=472692

Maik Qualmann  changed:

   What|Removed |Added

 CC||metzping...@gmail.com

--- Comment #1 from Maik Qualmann  ---
I tested it here under Linux, the Windows test will follow. If I select German
as the language, I get a correct text with German umlauts.

Maik

-- 
You are receiving this mail because:
You are watching all bug changes.