https://bugs.kde.org/show_bug.cgi?id=472692

            Bug ID: 472692
           Summary: Tesseract OCR does not take language selection into
                    account
    Classification: Applications
           Product: digikam
           Version: 8.2.0
          Platform: Microsoft Windows
                OS: Microsoft Windows
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: Plugin-Generic-OcrTextConverter
          Assignee: digikam-bugs-n...@kde.org
          Reporter: claus.peja+...@gmail.com
  Target Milestone: ---

Created attachment 160560
  --> https://bugs.kde.org/attachment.cgi?id=160560&action=edit
One of the files I want to extract text from.

SUMMARY
I use Tesseract OCR with digiKam 8.2.0 (20.07.2023) on Windows 10 Pro. I try to
get the text from a jpg. If I select 'Languages: Default', I get a result, but
German umlauts, ä, ü, and ö, are scanned incorrectly as a o u, and, yes, that's
a difference in German ;-) .
But when I select 'Languages: deu', I get no result. No test is found at all.
But also selecting e.g. eng gives no result.
However, when I use Tesseract (v5.3.1.20230401) directly on the command line
with switch -l deu, it works.
Tesseract command that works: tesseract /dir/pic1.jpg /text/pic1.ocr-result -l
deu
I attach one of the pictures I use. I marked the last sentence and the umlauts
in it.

STEPS TO REPRODUCE
1. Open the image attached in the 'OCR text converter...'
2. Select 'Languages: Default'. What you select for 'Segmentation mode' and
'Engine mode' makes no difference. DPI=72
3. Start OCR
4. Now you get the result without umlauts (ö ü)
5. Close OCR
6. Open the same image again in 'OCR text converter...'
7. Select 'Languages: deu'. What you select for 'Segmentation mode' and 'Engine
mode' makes no difference. DPI=72
8. Start OCR
9. Now you get no result

OBSERVED RESULT
With default, the sentence is scanned as: Die Giebel und Traufen konnen durch
Wind- bzw. Traufen- oder Tropf-bretter geschutzt werden.

EXPECTED RESULT
The correct sentence is: Die Giebel und Traufen können durch Wind- bzw.
Traufen- oder Tropf-bretter geschützt werden.

SOFTWARE/OS VERSIONS
Windows 10, 22H2

ADDITIONAL INFORMATION

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to