one more thing: I used a language file from
https://github.com/tesseract-ocr/tessdata e.g. with legacy engine data.

Zdenko


so 11. 3. 2023 o 13:18 nguyen ngoc hai <[email protected]>
napísal(a):

> Thank you very much for your help.
> I will give it a try.
>
> Best regards
> Hai
>
>
> On Sat, Mar 11, 2023, 8:14 PM Zdenko Podobny <[email protected]> wrote:
>
>> the latest code (5.3.0) (on windows)
>>
>> Zdenko
>>
>>
>> so 11. 3. 2023 o 2:16 nguyen ngoc hai <[email protected]>
>> napísal(a):
>>
>>> Dear Zdenko,
>>>
>>> Thank you very much for your suggestion.
>>>
>>> May I ask which version of tesseract are you using?
>>> I ran the same command with tesseract v5.0.0, but I got a different
>>> result.
>>>
>>> ```
>>> >tesseract -v
>>> tesseract v5.0.0-alpha.20210811
>>> ...
>>> Warning, detects only orientation with -l jpn
>>> Page number: 0
>>> Orientation in degrees: 270
>>> Rotate: 90
>>> Orientation confidence: 46.00
>>> Script: Latin
>>> Script confidence: 2.00
>>> ```
>>> Should I upgrade to the newest version of tesseract or try some extra
>>> preprocessing methods before detecting text orientation?
>>> Thank you for your time.
>>> Best regards
>>> Hai
>>>
>>>
>>>
>>> On Sat, Mar 11, 2023 at 5:34 AM Zdenko Podobny <[email protected]> wrote:
>>>
>>>> script detection was always problematic and tesseract try to
>>>> identify only a few...
>>>>
>>>> Regarding rotation you can get better results by using the language
>>>> file:
>>>> >tesseract unnamed.jpg - --psm 0 -l jpn
>>>> Warning, detects only orientation with -l jpn
>>>> Estimating resolution as 262
>>>> Warning. Invalid resolution 0 dpi. Using 70 instead.
>>>> Page number: 0
>>>> Orientation in degrees: 90
>>>> Rotate: 270
>>>> Orientation confidence: 6.44
>>>> Script: Han
>>>> Script confidence: 1.43
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> pi 10. 3. 2023 o 18:21 nguyen ngoc hai <[email protected]>
>>>> napísal(a):
>>>>
>>>>> I have the following image:
>>>>>
>>>>>  [image: 17_Receipt Transform No resize.jpg]
>>>>>
>>>>> I used the following code to get the text orientation, it works for
>>>>> most of my samples except the above image.
>>>>>
>>>>> ```python
>>>>>     def get_orientation_confidence(cv2_img_data):
>>>>>         image = cv2pil(cv2_img_data)
>>>>>         osd_result = {}
>>>>>
>>>>>         with tesserocr.PyTessBaseAPI(lang='osd') as api:
>>>>>             api.SetImage(image)
>>>>>             api.SetSourceResolution(300)
>>>>>
>>>>>             osd_result = api.DetectOrientationScript()
>>>>>
>>>>>         return osd_result
>>>>>
>>>>>     # preprocess image before detecting orientation
>>>>>     gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
>>>>>     gray_white_border = self.make_border_white(gray)
>>>>>     self.show_image("gray_white_border", gray_white_border)
>>>>>
>>>>>     # Threshold the image to convert it to black and white
>>>>>     threshold = cv2.threshold(gray_white_border, 0, 255,
>>>>> cv2.THRESH_OTSU)[1]
>>>>>     self.show_image("threshold otsu", threshold)
>>>>>
>>>>>     osd_ret = get_orientation_confidence(pre_roi_im)
>>>>>     print(osd_ret['orient_deg'])
>>>>> ```
>>>>> ```cmd
>>>>> {'orient_deg': 180, 'orient_conf': 0.06795501708984375, 'script_name':
>>>>> 'Arabic', 'script_conf': 0.0}
>>>>> ```
>>>>> Here, the results I got were not correct, and also wrong language
>>>>> detection.
>>>>>
>>>>> I hope to get {'orient_deg': 90, 'script_name': 'Japanese', ...}
>>>>> I supposed the results belonged to tesseract's output results.
>>>>>
>>>>> Is that possible to get the correct orientation degree here?
>>>>> Assuming that I already know the language, are there any methods (such
>>>>> as applying extra image preprocessing, etc.) that can provide better
>>>>> accuracy here?
>>>>>
>>>>> Thank you very much for your time.
>>>>> I hope to hear any suggestions.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "tesseract-ocr" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe
>>>> .
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> *Nguyen Ngoc Hai*
>>>
>>> *Phone:  +81 1488 4168  (JP).*
>>> *skype ID: nguyenngochaibkhn.*
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zY61t3EAEij0crvmk10Ld2D-54DLdR%3DZfgF4M%3DzeaeTQ%40mail.gmail.com.

Reply via email to