one more thing: I used a language file from https://github.com/tesseract-ocr/tessdata e.g. with legacy engine data.
Zdenko so 11. 3. 2023 o 13:18 nguyen ngoc hai <[email protected]> napísal(a): > Thank you very much for your help. > I will give it a try. > > Best regards > Hai > > > On Sat, Mar 11, 2023, 8:14 PM Zdenko Podobny <[email protected]> wrote: > >> the latest code (5.3.0) (on windows) >> >> Zdenko >> >> >> so 11. 3. 2023 o 2:16 nguyen ngoc hai <[email protected]> >> napísal(a): >> >>> Dear Zdenko, >>> >>> Thank you very much for your suggestion. >>> >>> May I ask which version of tesseract are you using? >>> I ran the same command with tesseract v5.0.0, but I got a different >>> result. >>> >>> ``` >>> >tesseract -v >>> tesseract v5.0.0-alpha.20210811 >>> ... >>> Warning, detects only orientation with -l jpn >>> Page number: 0 >>> Orientation in degrees: 270 >>> Rotate: 90 >>> Orientation confidence: 46.00 >>> Script: Latin >>> Script confidence: 2.00 >>> ``` >>> Should I upgrade to the newest version of tesseract or try some extra >>> preprocessing methods before detecting text orientation? >>> Thank you for your time. >>> Best regards >>> Hai >>> >>> >>> >>> On Sat, Mar 11, 2023 at 5:34 AM Zdenko Podobny <[email protected]> wrote: >>> >>>> script detection was always problematic and tesseract try to >>>> identify only a few... >>>> >>>> Regarding rotation you can get better results by using the language >>>> file: >>>> >tesseract unnamed.jpg - --psm 0 -l jpn >>>> Warning, detects only orientation with -l jpn >>>> Estimating resolution as 262 >>>> Warning. Invalid resolution 0 dpi. Using 70 instead. >>>> Page number: 0 >>>> Orientation in degrees: 90 >>>> Rotate: 270 >>>> Orientation confidence: 6.44 >>>> Script: Han >>>> Script confidence: 1.43 >>>> >>>> Zdenko >>>> >>>> >>>> pi 10. 3. 2023 o 18:21 nguyen ngoc hai <[email protected]> >>>> napísal(a): >>>> >>>>> I have the following image: >>>>> >>>>> [image: 17_Receipt Transform No resize.jpg] >>>>> >>>>> I used the following code to get the text orientation, it works for >>>>> most of my samples except the above image. >>>>> >>>>> ```python >>>>> def get_orientation_confidence(cv2_img_data): >>>>> image = cv2pil(cv2_img_data) >>>>> osd_result = {} >>>>> >>>>> with tesserocr.PyTessBaseAPI(lang='osd') as api: >>>>> api.SetImage(image) >>>>> api.SetSourceResolution(300) >>>>> >>>>> osd_result = api.DetectOrientationScript() >>>>> >>>>> return osd_result >>>>> >>>>> # preprocess image before detecting orientation >>>>> gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) >>>>> gray_white_border = self.make_border_white(gray) >>>>> self.show_image("gray_white_border", gray_white_border) >>>>> >>>>> # Threshold the image to convert it to black and white >>>>> threshold = cv2.threshold(gray_white_border, 0, 255, >>>>> cv2.THRESH_OTSU)[1] >>>>> self.show_image("threshold otsu", threshold) >>>>> >>>>> osd_ret = get_orientation_confidence(pre_roi_im) >>>>> print(osd_ret['orient_deg']) >>>>> ``` >>>>> ```cmd >>>>> {'orient_deg': 180, 'orient_conf': 0.06795501708984375, 'script_name': >>>>> 'Arabic', 'script_conf': 0.0} >>>>> ``` >>>>> Here, the results I got were not correct, and also wrong language >>>>> detection. >>>>> >>>>> I hope to get {'orient_deg': 90, 'script_name': 'Japanese', ...} >>>>> I supposed the results belonged to tesseract's output results. >>>>> >>>>> Is that possible to get the correct orientation degree here? >>>>> Assuming that I already know the language, are there any methods (such >>>>> as applying extra image preprocessing, etc.) that can provide better >>>>> accuracy here? >>>>> >>>>> Thank you very much for your time. >>>>> I hope to hear any suggestions. >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e447f23e-a0e1-4a91-b6e1-0eca8511f7acn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "tesseract-ocr" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe >>>> . >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoY%2BTVbQLuSXXN3u-5LEAPpZ4nq7CJHdFRXLQJta2yBQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> *Nguyen Ngoc Hai* >>> >>> *Phone: +81 1488 4168 (JP).* >>> *skype ID: nguyenngochaibkhn.* >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfqTWpZ5rbkAUFVY2-cKhKBFq3CY33bAaCyVLtv3tsGWXw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-ocr" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/tesseract-ocr/CPTtW5bPqYc/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wZDotnyN8NpGpbDPPrpWG7vDJj_sX6XrOZAUsfa888qw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CA%2BWjAfoP4JY4%2BLEfAKvA2qrua86jh5jf6KWJoaMoBiL2hvp_Jg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zY61t3EAEij0crvmk10Ld2D-54DLdR%3DZfgF4M%3DzeaeTQ%40mail.gmail.com.

