Re: [tesseract-ocr] accuracy problem after trained in fine-tune

Ali hussain Fri, 15 Sep 2023 08:42:32 -0700

yes, two months ago when I started to learn OCR I saw that. it was very 
helpful at the beginning.
On Friday, 15 September, 2023 at 4:01:32 pm UTC+6 desal...@gmail.com wrote:


> Just saw this paper: https://osf.io/b8h7q
>
> On Thursday, September 14, 2023 at 9:02:22 PM UTC+3 mdalihu...@gmail.com 
> wrote:
>
>> I will try some changes. thx
>>
>> On Thursday, 14 September, 2023 at 2:46:36 pm UTC+6 elvi...@gmail.com 
>> wrote:
>>
>>> I also faced that issue in the Windows. Apparently, the issue is related 
>>> with unicode. You can try your luck by changing  "r" to "utf8" in the 
>>> script.
>>> I end up installing Ubuntu because i was having too many errors in the 
>>> Windows.
>>>
>>> On Thu, Sep 14, 2023, 9:33 AM Ali hussain <mdalihu...@gmail.com> wrote:
>>>
>>>> you faced this error,  Can't encode transcription? if you faced how you 
>>>> have solved this?
>>>>
>>>> On Thursday, 14 September, 2023 at 10:51:52 am UTC+6 elvi...@gmail.com 
>>>> wrote:
>>>>
>>>>> I was using my own text
>>>>>
>>>>> On Thu, Sep 14, 2023, 6:58 AM Ali hussain <mdalihu...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> you are training from Tessearact default text data or your own 
>>>>>> collected text data?
>>>>>> On Thursday, 14 September, 2023 at 12:19:53 am UTC+6 
>>>>>> desal...@gmail.com wrote:
>>>>>>
>>>>>>> I now get to 200000 iterations; and the error rate is stuck at 0.46. 
>>>>>>> The result is absolutely trash: nowhere close to the default/Ray's 
>>>>>>> training. 
>>>>>>>
>>>>>>> On Wednesday, September 13, 2023 at 2:47:05 PM UTC+3 
>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> after Tesseact recognizes text from images. then you can apply 
>>>>>>>> regex to replace the wrong word with to correct word.
>>>>>>>> I'm not familiar with paddleOcr and scanTailor also.
>>>>>>>>
>>>>>>>> On Wednesday, 13 September, 2023 at 5:06:12 pm UTC+6 
>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>
>>>>>>>>> At what stage are you doing the regex replacement?
>>>>>>>>> My process has been: Scan (tif)--> ScanTailor --> Tesseract --> pdf
>>>>>>>>>
>>>>>>>>> >EasyOCR I think is best for ID cards or something like that image 
>>>>>>>>> process. but document images like books, here Tesseract is better 
>>>>>>>>> than 
>>>>>>>>> EasyOCR.
>>>>>>>>>
>>>>>>>>> How about paddleOcr?, are you familiar with it?
>>>>>>>>>
>>>>>>>>> On Wednesday, September 13, 2023 at 1:45:54 PM UTC+3 
>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>
>>>>>>>>>> I know what you mean. but in some cases, it helps me.  I have 
>>>>>>>>>> faced specific characters and words are always not recognized by 
>>>>>>>>>> Tesseract. 
>>>>>>>>>> That way I use these regex to replace those characters   and words 
>>>>>>>>>> if  
>>>>>>>>>> those characters are incorrect.
>>>>>>>>>>
>>>>>>>>>> see what I have done: 
>>>>>>>>>>
>>>>>>>>>>    " ী": "ী",
>>>>>>>>>>     " ্": " ",
>>>>>>>>>>     " ে": " ",
>>>>>>>>>>     জ্া: "জা",
>>>>>>>>>>     "  ": " ",
>>>>>>>>>>     "   ": " ",
>>>>>>>>>>     "    ": " ",
>>>>>>>>>>     "্প": " ",
>>>>>>>>>>     " য": "র্য",
>>>>>>>>>>     য: "য",
>>>>>>>>>>     " া": "া",
>>>>>>>>>>     আা: "আ",
>>>>>>>>>>     ম্ি: "মি",
>>>>>>>>>>     স্ু: "সু",
>>>>>>>>>>     "হূ ": "হূ",
>>>>>>>>>>     " ণ": "ণ",
>>>>>>>>>>     র্্: "র",
>>>>>>>>>>     "চিন্ত ": "চিন্তা ",
>>>>>>>>>>     ন্া: "না",
>>>>>>>>>>     "সম ূর্ন": "সম্পূর্ণ",
>>>>>>>>>> On Wednesday, 13 September, 2023 at 4:18:22 pm UTC+6 
>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>
>>>>>>>>>>> The problem for regex is that Tesseract is not consistent in its 
>>>>>>>>>>> replacement. 
>>>>>>>>>>> Think of the original training of English data doesn't contain 
>>>>>>>>>>> the letter /u/. What does Tesseract do when it faces /u/ in actual 
>>>>>>>>>>> processing??
>>>>>>>>>>> In some cases, it replaces it with closely similar letters such 
>>>>>>>>>>> as /v/ and /w/. In other cases, it completely removes it. That is 
>>>>>>>>>>> what is 
>>>>>>>>>>> happening with my case. Those characters re sometimes completely 
>>>>>>>>>>> removed; 
>>>>>>>>>>> other times, they are replaced by closely resembling characters. 
>>>>>>>>>>> Because of 
>>>>>>>>>>> this inconsistency, applying regex is very difficult. 
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:02:01 PM UTC+3 
>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>
>>>>>>>>>>>> if Some specific characters or words are always missing 
>>>>>>>>>>>> from the OCR result.  then you can apply logic with the Regular 
>>>>>>>>>>>> expressions 
>>>>>>>>>>>> method on your applications. After OCR, these specific characters 
>>>>>>>>>>>> or words 
>>>>>>>>>>>> will be replaced by current characters or words that you defined 
>>>>>>>>>>>> in your 
>>>>>>>>>>>> applications by  Regular expressions. it can be done in some major 
>>>>>>>>>>>> problems.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:51:29 pm UTC+6 
>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The characters are getting missed, even after fine-tuning. 
>>>>>>>>>>>>> I never made any progress. I tried many different ways. Some  
>>>>>>>>>>>>> specific characters are always missing from the OCR result.  
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:49:20 PM UTC+3 
>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> EasyOCR I think is best for ID cards or something like that 
>>>>>>>>>>>>>> image process. but document images like books, here Tesseract is 
>>>>>>>>>>>>>> better 
>>>>>>>>>>>>>> than EasyOCR.  Even I didn't use EasyOCR. you can try it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have added words of dictionaries but the result is the 
>>>>>>>>>>>>>> same. 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> what kind of problem you have faced in fine-tuning in few new 
>>>>>>>>>>>>>> characters as you said (*but, I failed in every possible way 
>>>>>>>>>>>>>> to introduce a few new characters into the database.)*
>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:33:48 pm UTC+6 
>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, we are new to this. I find the instructions (the 
>>>>>>>>>>>>>>> manual) very hard to follow. The video you linked above was 
>>>>>>>>>>>>>>> really helpful  
>>>>>>>>>>>>>>> to get started. My plan at the beginning was to fine tune the 
>>>>>>>>>>>>>>> existing 
>>>>>>>>>>>>>>> .traineddata. But, I failed in every possible way to introduce 
>>>>>>>>>>>>>>> a few new 
>>>>>>>>>>>>>>> characters into the database. That is why I started from 
>>>>>>>>>>>>>>> scratch. 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sure, I will follow Lorenzo's suggestion: will run more the 
>>>>>>>>>>>>>>> iterations, and see if I can improve. 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Another areas we need to explore is usage of dictionaries 
>>>>>>>>>>>>>>> actually. May be adding millions of words into the dictionary 
>>>>>>>>>>>>>>> could help 
>>>>>>>>>>>>>>> Tesseract. I don't have millions of words; but I am looking 
>>>>>>>>>>>>>>> into some 
>>>>>>>>>>>>>>> corpus to get more words into the dictionary. 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If this all fails, EasyOCR (and probably other similar 
>>>>>>>>>>>>>>> open-source packages)  is probably our next option to try on. 
>>>>>>>>>>>>>>> Sure, sharing 
>>>>>>>>>>>>>>> our experiences will be helpful. I will let you know if I made 
>>>>>>>>>>>>>>> good 
>>>>>>>>>>>>>>> progresses in any of these options. 
>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:19:48 PM UTC+3 
>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How is your training going for Bengali?  It was nearly good 
>>>>>>>>>>>>>>>> but I faced space problems between two words, some words are 
>>>>>>>>>>>>>>>> spaces but 
>>>>>>>>>>>>>>>> most of them have no space. I think is problem is in the 
>>>>>>>>>>>>>>>> dataset but I use 
>>>>>>>>>>>>>>>> the default training dataset from Tesseract which is used in 
>>>>>>>>>>>>>>>> Ben That way I 
>>>>>>>>>>>>>>>> am confused so I have to explore more. by the way,  you can 
>>>>>>>>>>>>>>>> try as Lorenzo 
>>>>>>>>>>>>>>>> Blz said.  Actually training from scratch is harder than 
>>>>>>>>>>>>>>>> fine-tuning. so you can use different datasets to explore. if 
>>>>>>>>>>>>>>>> you succeed. 
>>>>>>>>>>>>>>>> please let me know how you have done this whole process.  I'm 
>>>>>>>>>>>>>>>> also new in 
>>>>>>>>>>>>>>>> this field.
>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 1:13:43 pm UTC+6 
>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How is your training going for Bengali?
>>>>>>>>>>>>>>>>> I have been trying to train from scratch. I made about 
>>>>>>>>>>>>>>>>> 64,000 lines of text (which produced about 255,000 files, in 
>>>>>>>>>>>>>>>>> the end) and 
>>>>>>>>>>>>>>>>> run the training for 150,000 iterations; getting 0.51 
>>>>>>>>>>>>>>>>> training error rate. 
>>>>>>>>>>>>>>>>> I was hopping to get reasonable accuracy. Unfortunately, when 
>>>>>>>>>>>>>>>>> I run the OCR 
>>>>>>>>>>>>>>>>> using  .traineddata,  the accuracy is absolutely terrible. Do 
>>>>>>>>>>>>>>>>> you think I 
>>>>>>>>>>>>>>>>> made some mistakes, or that is an expected result?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tuesday, September 12, 2023 at 11:15:25 PM UTC+3 
>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, he doesn't mention all fonts but only one font.  
>>>>>>>>>>>>>>>>>> That way he didn't use *MODEL_NAME in a separate *
>>>>>>>>>>>>>>>>>> *script **file script I think.*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Actually, here we teach all *tif, gt.txt, and .box 
>>>>>>>>>>>>>>>>>> files *which are created by  *MODEL_NAME I mean **eng, 
>>>>>>>>>>>>>>>>>> ben, oro flag or language code *because when we first 
>>>>>>>>>>>>>>>>>> create *tif, gt.txt, and .box files, *every file starts 
>>>>>>>>>>>>>>>>>> by  *MODEL_NAME*. This  *MODEL_NAME*  we selected on the 
>>>>>>>>>>>>>>>>>> training script for looping each tif, gt.txt, and .box files 
>>>>>>>>>>>>>>>>>> which are 
>>>>>>>>>>>>>>>>>> created by  *MODEL_NAME.*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tuesday, 12 September, 2023 at 9:42:13 pm UTC+6 
>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, I am familiar with the video and have set up the 
>>>>>>>>>>>>>>>>>>> folder structure as you did. Indeed, I have tried a number 
>>>>>>>>>>>>>>>>>>> of fine-tuning 
>>>>>>>>>>>>>>>>>>> with a single font following Gracia's video. But, your 
>>>>>>>>>>>>>>>>>>> script is much  
>>>>>>>>>>>>>>>>>>> better because supports multiple fonts. The whole 
>>>>>>>>>>>>>>>>>>> improvement you made is  
>>>>>>>>>>>>>>>>>>> brilliant; and very useful. It is all working for me. 
>>>>>>>>>>>>>>>>>>> The only part that I didn't understand is the trick you 
>>>>>>>>>>>>>>>>>>> used in your tesseract_train.py script. You see, I have 
>>>>>>>>>>>>>>>>>>> been doing exactly 
>>>>>>>>>>>>>>>>>>> to you did except this script. 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The scripts seems to have the trick of sending/teaching 
>>>>>>>>>>>>>>>>>>> each of the fonts (iteratively) into the model. The script 
>>>>>>>>>>>>>>>>>>> I have been 
>>>>>>>>>>>>>>>>>>> using  (which I get from Garcia) doesn't mention font at 
>>>>>>>>>>>>>>>>>>> all. 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *TESSDATA_PREFIX=../tesseract/tessdata make training 
>>>>>>>>>>>>>>>>>>> MODEL_NAME=oro TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000*
>>>>>>>>>>>>>>>>>>> Does it mean that my model does't train the fonts (even 
>>>>>>>>>>>>>>>>>>> if the fonts have been included in the splitting process, 
>>>>>>>>>>>>>>>>>>> in the other 
>>>>>>>>>>>>>>>>>>> script)?
>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 10:54:08 AM UTC+3 
>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font namesfont_names = 
>>>>>>>>>>>>>>>>>>>> ['ben']for font in font_names:    command = 
>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make training 
>>>>>>>>>>>>>>>>>>>> MODEL_NAME={font} 
>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *    subprocess.run(command, shell=True) 1 . This 
>>>>>>>>>>>>>>>>>>>> command is for training data that I have named '*
>>>>>>>>>>>>>>>>>>>> tesseract_training*.py' inside tesstrain folder.*
>>>>>>>>>>>>>>>>>>>> *2. root directory means your main training folder and 
>>>>>>>>>>>>>>>>>>>> inside it as like langdata, tessearact,  tesstrain 
>>>>>>>>>>>>>>>>>>>> folders. if you see this 
>>>>>>>>>>>>>>>>>>>> tutorial    *
>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=KE4xEzFGSU8   you will 
>>>>>>>>>>>>>>>>>>>> understand better the folder structure. only I 
>>>>>>>>>>>>>>>>>>>> created tesseract_training.py in tesstrain folder for 
>>>>>>>>>>>>>>>>>>>> training and  
>>>>>>>>>>>>>>>>>>>> FontList.py file is the main path as *like langdata, 
>>>>>>>>>>>>>>>>>>>> tessearact,  tesstrain, and *split_training_text.py.
>>>>>>>>>>>>>>>>>>>> 3. first of all you have to put all fonts in your Linux 
>>>>>>>>>>>>>>>>>>>> fonts folder.   /usr/share/fonts/  then run:  sudo apt 
>>>>>>>>>>>>>>>>>>>> update  then sudo fc-cache -fv
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> after that, you have to add the exact font's name in 
>>>>>>>>>>>>>>>>>>>> FontList.py file like me.
>>>>>>>>>>>>>>>>>>>> I  have added two pic my folder structure. first is 
>>>>>>>>>>>>>>>>>>>> main structure pic and the second is the Colopse tesstrain 
>>>>>>>>>>>>>>>>>>>> folder.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I[image: Screenshot 2023-09-11 134947.png][image: 
>>>>>>>>>>>>>>>>>>>> Screenshot 2023-09-11 135014.png] 
>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 12:50:03 pm UTC+6 
>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you so much for putting out these brilliant 
>>>>>>>>>>>>>>>>>>>>> scripts. They make the process  much more efficient.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have one more question on the other script that you 
>>>>>>>>>>>>>>>>>>>>> use to train. 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font namesfont_names = 
>>>>>>>>>>>>>>>>>>>>> ['ben']for font in font_names:    command = 
>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make training 
>>>>>>>>>>>>>>>>>>>>> MODEL_NAME={font} 
>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"*
>>>>>>>>>>>>>>>>>>>>> *    subprocess.run(command, shell=True) *
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Do you have the name of fonts listed in file in the 
>>>>>>>>>>>>>>>>>>>>> same/root directory?
>>>>>>>>>>>>>>>>>>>>> How do you setup the names of the fonts in the file, 
>>>>>>>>>>>>>>>>>>>>> if you don't mind sharing it?
>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 4:27:27 AM UTC+3 
>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> You can use the new script below. it's better than 
>>>>>>>>>>>>>>>>>>>>>> the previous two scripts.  You can create *tif, 
>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *by multiple fonts and also 
>>>>>>>>>>>>>>>>>>>>>> use breakpoint if vs code close or anything during 
>>>>>>>>>>>>>>>>>>>>>> creating *tif, 
>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *then you can checkpoint to 
>>>>>>>>>>>>>>>>>>>>>> navigate where you close vs code.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> command for *tif, gt.txt, and .box files *
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> import os
>>>>>>>>>>>>>>>>>>>>>> import random
>>>>>>>>>>>>>>>>>>>>>> import pathlib
>>>>>>>>>>>>>>>>>>>>>> import subprocess
>>>>>>>>>>>>>>>>>>>>>> import argparse
>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=None, 
>>>>>>>>>>>>>>>>>>>>>> end_line=None):
>>>>>>>>>>>>>>>>>>>>>>     lines = []
>>>>>>>>>>>>>>>>>>>>>>     with open(training_text_file, 'r') as input_file:
>>>>>>>>>>>>>>>>>>>>>>         lines = input_file.readlines()
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     if not os.path.exists(output_directory):
>>>>>>>>>>>>>>>>>>>>>>         os.mkdir(output_directory)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     if start_line is None:
>>>>>>>>>>>>>>>>>>>>>>         start_line = 0
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     if end_line is None:
>>>>>>>>>>>>>>>>>>>>>>         end_line = len(lines) - 1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     for font_name in font_list.fonts:
>>>>>>>>>>>>>>>>>>>>>>         for line_index in range(start_line, end_line 
>>>>>>>>>>>>>>>>>>>>>> + 1):
>>>>>>>>>>>>>>>>>>>>>>             line = lines[line_index].strip()
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             training_text_file_name = pathlib.Path(
>>>>>>>>>>>>>>>>>>>>>> training_text_file).stem
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             line_serial = f"{line_index:d}"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             line_gt_text = os.path.join(
>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{training_text_file_name}_{
>>>>>>>>>>>>>>>>>>>>>> line_serial}_{font_name.replace(" ", "_")}.gt.txt')
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             with open(line_gt_text, 'w') as 
>>>>>>>>>>>>>>>>>>>>>> output_file:
>>>>>>>>>>>>>>>>>>>>>>                 output_file.writelines([line])
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             file_base_name = f'{
>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{
>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}'
>>>>>>>>>>>>>>>>>>>>>>             subprocess.run([
>>>>>>>>>>>>>>>>>>>>>>                 'text2image',
>>>>>>>>>>>>>>>>>>>>>>                 f'--font={font_name}',
>>>>>>>>>>>>>>>>>>>>>>                 f'--text={line_gt_text}',
>>>>>>>>>>>>>>>>>>>>>>                 f'--outputbase={output_directory}/{
>>>>>>>>>>>>>>>>>>>>>> file_base_name}',
>>>>>>>>>>>>>>>>>>>>>>                 '--max_pages=1',
>>>>>>>>>>>>>>>>>>>>>>                 '--strip_unrenderable_words',
>>>>>>>>>>>>>>>>>>>>>>                 '--leading=36',
>>>>>>>>>>>>>>>>>>>>>>                 '--xsize=3600',
>>>>>>>>>>>>>>>>>>>>>>                 '--ysize=330',
>>>>>>>>>>>>>>>>>>>>>>                 '--char_spacing=1.0',
>>>>>>>>>>>>>>>>>>>>>>                 '--exposure=0',
>>>>>>>>>>>>>>>>>>>>>>                 '
>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/eng.unicharset',
>>>>>>>>>>>>>>>>>>>>>>             ])
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__":
>>>>>>>>>>>>>>>>>>>>>>     parser = argparse.ArgumentParser()
>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--start', type=int, 
>>>>>>>>>>>>>>>>>>>>>> help='Starting 
>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--end', type=int, help='Ending 
>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>     args = parser.parse_args()
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     training_text_file = 'langdata/eng.training_text'
>>>>>>>>>>>>>>>>>>>>>>     output_directory = '
>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/eng-ground-truth'
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     font_list = FontList()
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, args.end)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Then create a file called "FontList" in the root 
>>>>>>>>>>>>>>>>>>>>>> directory and paste it.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> class FontList:
>>>>>>>>>>>>>>>>>>>>>>     def __init__(self):
>>>>>>>>>>>>>>>>>>>>>>         self.fonts = [
>>>>>>>>>>>>>>>>>>>>>>         "Gerlick"
>>>>>>>>>>>>>>>>>>>>>>             "Sagar Medium",
>>>>>>>>>>>>>>>>>>>>>>             "Ekushey Lohit Normal",  
>>>>>>>>>>>>>>>>>>>>>>            "Charukola Round Head Regular, weight=433"
>>>>>>>>>>>>>>>>>>>>>> ,
>>>>>>>>>>>>>>>>>>>>>>             "Charukola Round Head Bold, weight=443",
>>>>>>>>>>>>>>>>>>>>>>             "Ador Orjoma Unicode",
>>>>>>>>>>>>>>>>>>>>>>       
>>>>>>>>>>>>>>>>>>>>>>           
>>>>>>>>>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>>>>>>>>> ]                         
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> then import in the above code,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> *for breakpoint command:*
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> sudo python3 split_training_text.py --start 0  --end 
>>>>>>>>>>>>>>>>>>>>>> 11
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> change checkpoint according to you  --start 0 --end 
>>>>>>>>>>>>>>>>>>>>>> 11.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> *and training checkpoint as you know already.*
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 1:22:34 am UTC+6 
>>>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi mhalidu, 
>>>>>>>>>>>>>>>>>>>>>>> the script you posted here seems much more extensive 
>>>>>>>>>>>>>>>>>>>>>>> than you posted before: 
>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0e2880d9-64c0-4659-b497-902a5747caf4n%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I have been using your earlier script. It is 
>>>>>>>>>>>>>>>>>>>>>>> magical. How is this one different from the earlier 
>>>>>>>>>>>>>>>>>>>>>>> one?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thank you for posting these scripts, by the way. It 
>>>>>>>>>>>>>>>>>>>>>>> has saved my countless hours; by running multiple fonts 
>>>>>>>>>>>>>>>>>>>>>>> in one sweep. I was 
>>>>>>>>>>>>>>>>>>>>>>> not able to find any instruction on how to train for  
>>>>>>>>>>>>>>>>>>>>>>> multiple fonts. The 
>>>>>>>>>>>>>>>>>>>>>>> official manual is also unclear. YOUr script helped me 
>>>>>>>>>>>>>>>>>>>>>>> to get started. 
>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, August 9, 2023 at 11:00:49 PM UTC+3 
>>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ok, I will try as you said.
>>>>>>>>>>>>>>>>>>>>>>>> one more thing, what's the role of the trained_text 
>>>>>>>>>>>>>>>>>>>>>>>> lines will be? I have seen Bengali text are long words 
>>>>>>>>>>>>>>>>>>>>>>>> of lines. so I wanna 
>>>>>>>>>>>>>>>>>>>>>>>> know how many words or characters will be the better 
>>>>>>>>>>>>>>>>>>>>>>>> choice for the train? 
>>>>>>>>>>>>>>>>>>>>>>>> and '--xsize=3600','--ysize=350',  will be according 
>>>>>>>>>>>>>>>>>>>>>>>> to words of lines?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thursday, 10 August, 2023 at 1:10:14 am UTC+6 
>>>>>>>>>>>>>>>>>>>>>>>> shree wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Include the default fonts also in your fine-tuning 
>>>>>>>>>>>>>>>>>>>>>>>>> list of fonts and see if that helps.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 9, 2023, 2:27 PM Ali hussain <
>>>>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I have trained some new fonts by fine-tune 
>>>>>>>>>>>>>>>>>>>>>>>>>> methods for the Bengali language in Tesseract 5 and 
>>>>>>>>>>>>>>>>>>>>>>>>>> I have used all 
>>>>>>>>>>>>>>>>>>>>>>>>>> official trained_text and tessdata_best and other 
>>>>>>>>>>>>>>>>>>>>>>>>>> things also.  everything 
>>>>>>>>>>>>>>>>>>>>>>>>>> is good but the problem is the default font which 
>>>>>>>>>>>>>>>>>>>>>>>>>> was trained before that 
>>>>>>>>>>>>>>>>>>>>>>>>>> does not convert text like prev but my new fonts 
>>>>>>>>>>>>>>>>>>>>>>>>>> work well. I don't 
>>>>>>>>>>>>>>>>>>>>>>>>>> understand why it's happening. I share code based to 
>>>>>>>>>>>>>>>>>>>>>>>>>> understand what going 
>>>>>>>>>>>>>>>>>>>>>>>>>> on.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> *codes  for creating tif, gt.txt, .box files:*
>>>>>>>>>>>>>>>>>>>>>>>>>> import os
>>>>>>>>>>>>>>>>>>>>>>>>>> import random
>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib
>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess
>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse
>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> def read_line_count():
>>>>>>>>>>>>>>>>>>>>>>>>>>     if os.path.exists('line_count.txt'):
>>>>>>>>>>>>>>>>>>>>>>>>>>         with open('line_count.txt', 'r') as file:
>>>>>>>>>>>>>>>>>>>>>>>>>>             return int(file.read())
>>>>>>>>>>>>>>>>>>>>>>>>>>     return 0
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> def write_line_count(line_count):
>>>>>>>>>>>>>>>>>>>>>>>>>>     with open('line_count.txt', 'w') as file:
>>>>>>>>>>>>>>>>>>>>>>>>>>         file.write(str(line_count))
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=None, 
>>>>>>>>>>>>>>>>>>>>>>>>>> end_line=None):
>>>>>>>>>>>>>>>>>>>>>>>>>>     lines = []
>>>>>>>>>>>>>>>>>>>>>>>>>>     with open(training_text_file, 'r') as 
>>>>>>>>>>>>>>>>>>>>>>>>>> input_file:
>>>>>>>>>>>>>>>>>>>>>>>>>>         for line in input_file.readlines():
>>>>>>>>>>>>>>>>>>>>>>>>>>             lines.append(line.strip())
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     if not os.path.exists(output_directory):
>>>>>>>>>>>>>>>>>>>>>>>>>>         os.mkdir(output_directory)
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     random.shuffle(lines)
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     if start_line is None:
>>>>>>>>>>>>>>>>>>>>>>>>>>         line_count = read_line_count()  # Set 
>>>>>>>>>>>>>>>>>>>>>>>>>> the starting line_count from the file
>>>>>>>>>>>>>>>>>>>>>>>>>>     else:
>>>>>>>>>>>>>>>>>>>>>>>>>>         line_count = start_line
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     if end_line is None:
>>>>>>>>>>>>>>>>>>>>>>>>>>         end_line_count = len(lines) - 1  # Set 
>>>>>>>>>>>>>>>>>>>>>>>>>> the ending line_count
>>>>>>>>>>>>>>>>>>>>>>>>>>     else:
>>>>>>>>>>>>>>>>>>>>>>>>>>         end_line_count = min(end_line, len(lines) 
>>>>>>>>>>>>>>>>>>>>>>>>>> - 1)
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     for font in font_list.fonts:  # Iterate 
>>>>>>>>>>>>>>>>>>>>>>>>>> through all the fonts in the font_list
>>>>>>>>>>>>>>>>>>>>>>>>>>         font_serial = 1
>>>>>>>>>>>>>>>>>>>>>>>>>>         for line in lines:
>>>>>>>>>>>>>>>>>>>>>>>>>>             training_text_file_name = pathlib.
>>>>>>>>>>>>>>>>>>>>>>>>>> Path(training_text_file).stem
>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>             # Generate a unique serial number 
>>>>>>>>>>>>>>>>>>>>>>>>>> for each line
>>>>>>>>>>>>>>>>>>>>>>>>>>             line_serial = f"{line_count:d}"
>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>             # GT (Ground Truth) text filename
>>>>>>>>>>>>>>>>>>>>>>>>>>             line_gt_text = os.path.join(
>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{training_text_file_name}_{
>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial}.gt.txt')
>>>>>>>>>>>>>>>>>>>>>>>>>>             with open(line_gt_text, 'w') as 
>>>>>>>>>>>>>>>>>>>>>>>>>> output_file:
>>>>>>>>>>>>>>>>>>>>>>>>>>                 output_file.writelines([line])
>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>             # Image filename
>>>>>>>>>>>>>>>>>>>>>>>>>>             file_base_name = f'ben_{line_serial}' 
>>>>>>>>>>>>>>>>>>>>>>>>>>  # Unique filename for each font
>>>>>>>>>>>>>>>>>>>>>>>>>>             subprocess.run([
>>>>>>>>>>>>>>>>>>>>>>>>>>                 'text2image',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--font={font}',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--text={line_gt_text}',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--outputbase={output_directory}
>>>>>>>>>>>>>>>>>>>>>>>>>> /{file_base_name}',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--max_pages=1',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--strip_unrenderable_words',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--leading=36',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--xsize=3600',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--ysize=350',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--char_spacing=1.0',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--exposure=0',
>>>>>>>>>>>>>>>>>>>>>>>>>>                 '
>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/ben.unicharset',
>>>>>>>>>>>>>>>>>>>>>>>>>>             ])
>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>             line_count += 1
>>>>>>>>>>>>>>>>>>>>>>>>>>             font_serial += 1
>>>>>>>>>>>>>>>>>>>>>>>>>>         
>>>>>>>>>>>>>>>>>>>>>>>>>>         # Reset font_serial for the next font 
>>>>>>>>>>>>>>>>>>>>>>>>>> iteration
>>>>>>>>>>>>>>>>>>>>>>>>>>         font_serial = 1
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     write_line_count(line_count)  # Update the 
>>>>>>>>>>>>>>>>>>>>>>>>>> line_count in the file
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__":
>>>>>>>>>>>>>>>>>>>>>>>>>>     parser = argparse.ArgumentParser()
>>>>>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--start', type=int, help
>>>>>>>>>>>>>>>>>>>>>>>>>> ='Starting line count (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--end', type=int, 
>>>>>>>>>>>>>>>>>>>>>>>>>> help='Ending 
>>>>>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>>>>>     args = parser.parse_args()
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     training_text_file = '
>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/ben.training_text'
>>>>>>>>>>>>>>>>>>>>>>>>>>     output_directory = '
>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/ben-ground-truth'
>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>     # Create an instance of the FontList class
>>>>>>>>>>>>>>>>>>>>>>>>>>     font_list = FontList()
>>>>>>>>>>>>>>>>>>>>>>>>>>      
>>>>>>>>>>>>>>>>>>>>>>>>>>     create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, args.end)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> *and for training code:*
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> # List of font names
>>>>>>>>>>>>>>>>>>>>>>>>>> font_names = ['ben']
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_names:
>>>>>>>>>>>>>>>>>>>>>>>>>>     command = 
>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>>>>>>> make training MODEL_NAME={font} START_MODEL=ben 
>>>>>>>>>>>>>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000 
>>>>>>>>>>>>>>>>>>>>>>>>>> LANG_TYPE=Indic"
>>>>>>>>>>>>>>>>>>>>>>>>>>     subprocess.run(command, shell=True)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> any suggestion to identify to extract the problem.
>>>>>>>>>>>>>>>>>>>>>>>>>> thanks, everyone
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>>> You received this message because you are 
>>>>>>>>>>>>>>>>>>>>>>>>>> subscribed to the Google Groups "tesseract-ocr" 
>>>>>>>>>>>>>>>>>>>>>>>>>> group.
>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving 
>>>>>>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to 
>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f57a721f-c8a8-4e86-9664-6a71ff337333n%40googlegroups.com.

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

Reply via email to