Hi Ali, How is your training going? Do you get good results with the training-from-the-scratch?
On Friday, September 15, 2023 at 6:42:26 PM UTC+3 tesseract-ocr wrote: > yes, two months ago when I started to learn OCR I saw that. it was very > helpful at the beginning. > On Friday, 15 September, 2023 at 4:01:32 pm UTC+6 desal...@gmail.com > wrote: > >> Just saw this paper: https://osf.io/b8h7q >> >> On Thursday, September 14, 2023 at 9:02:22 PM UTC+3 mdalihu...@gmail.com >> wrote: >> >>> I will try some changes. thx >>> >>> On Thursday, 14 September, 2023 at 2:46:36 pm UTC+6 elvi...@gmail.com >>> wrote: >>> >>>> I also faced that issue in the Windows. Apparently, the issue is >>>> related with unicode. You can try your luck by changing "r" to "utf8" in >>>> the script. >>>> I end up installing Ubuntu because i was having too many errors in the >>>> Windows. >>>> >>>> On Thu, Sep 14, 2023, 9:33 AM Ali hussain <mdalihu...@gmail.com> wrote: >>>> >>>>> you faced this error, Can't encode transcription? if you faced how >>>>> you have solved this? >>>>> >>>>> On Thursday, 14 September, 2023 at 10:51:52 am UTC+6 elvi...@gmail.com >>>>> wrote: >>>>> >>>>>> I was using my own text >>>>>> >>>>>> On Thu, Sep 14, 2023, 6:58 AM Ali hussain <mdalihu...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> you are training from Tessearact default text data or your own >>>>>>> collected text data? >>>>>>> On Thursday, 14 September, 2023 at 12:19:53 am UTC+6 >>>>>>> desal...@gmail.com wrote: >>>>>>> >>>>>>>> I now get to 200000 iterations; and the error rate is stuck at >>>>>>>> 0.46. The result is absolutely trash: nowhere close to the >>>>>>>> default/Ray's >>>>>>>> training. >>>>>>>> >>>>>>>> On Wednesday, September 13, 2023 at 2:47:05 PM UTC+3 >>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> after Tesseact recognizes text from images. then you can apply >>>>>>>>> regex to replace the wrong word with to correct word. >>>>>>>>> I'm not familiar with paddleOcr and scanTailor also. >>>>>>>>> >>>>>>>>> On Wednesday, 13 September, 2023 at 5:06:12 pm UTC+6 >>>>>>>>> desal...@gmail.com wrote: >>>>>>>>> >>>>>>>>>> At what stage are you doing the regex replacement? >>>>>>>>>> My process has been: Scan (tif)--> ScanTailor --> Tesseract --> >>>>>>>>>> pdf >>>>>>>>>> >>>>>>>>>> >EasyOCR I think is best for ID cards or something like that >>>>>>>>>> image process. but document images like books, here Tesseract is >>>>>>>>>> better >>>>>>>>>> than EasyOCR. >>>>>>>>>> >>>>>>>>>> How about paddleOcr?, are you familiar with it? >>>>>>>>>> >>>>>>>>>> On Wednesday, September 13, 2023 at 1:45:54 PM UTC+3 >>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>> >>>>>>>>>>> I know what you mean. but in some cases, it helps me. I have >>>>>>>>>>> faced specific characters and words are always not recognized by >>>>>>>>>>> Tesseract. >>>>>>>>>>> That way I use these regex to replace those characters and words >>>>>>>>>>> if >>>>>>>>>>> those characters are incorrect. >>>>>>>>>>> >>>>>>>>>>> see what I have done: >>>>>>>>>>> >>>>>>>>>>> " ী": "ী", >>>>>>>>>>> " ্": " ", >>>>>>>>>>> " ে": " ", >>>>>>>>>>> জ্া: "জা", >>>>>>>>>>> " ": " ", >>>>>>>>>>> " ": " ", >>>>>>>>>>> " ": " ", >>>>>>>>>>> "্প": " ", >>>>>>>>>>> " য": "র্য", >>>>>>>>>>> য: "য", >>>>>>>>>>> " া": "া", >>>>>>>>>>> আা: "আ", >>>>>>>>>>> ম্ি: "মি", >>>>>>>>>>> স্ু: "সু", >>>>>>>>>>> "হূ ": "হূ", >>>>>>>>>>> " ণ": "ণ", >>>>>>>>>>> র্্: "র", >>>>>>>>>>> "চিন্ত ": "চিন্তা ", >>>>>>>>>>> ন্া: "না", >>>>>>>>>>> "সম ূর্ন": "সম্পূর্ণ", >>>>>>>>>>> On Wednesday, 13 September, 2023 at 4:18:22 pm UTC+6 >>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>> >>>>>>>>>>>> The problem for regex is that Tesseract is not consistent in >>>>>>>>>>>> its replacement. >>>>>>>>>>>> Think of the original training of English data doesn't contain >>>>>>>>>>>> the letter /u/. What does Tesseract do when it faces /u/ in actual >>>>>>>>>>>> processing?? >>>>>>>>>>>> In some cases, it replaces it with closely similar letters such >>>>>>>>>>>> as /v/ and /w/. In other cases, it completely removes it. That is >>>>>>>>>>>> what is >>>>>>>>>>>> happening with my case. Those characters re sometimes completely >>>>>>>>>>>> removed; >>>>>>>>>>>> other times, they are replaced by closely resembling characters. >>>>>>>>>>>> Because of >>>>>>>>>>>> this inconsistency, applying regex is very difficult. >>>>>>>>>>>> >>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:02:01 PM UTC+3 >>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>> >>>>>>>>>>>>> if Some specific characters or words are always missing >>>>>>>>>>>>> from the OCR result. then you can apply logic with the Regular >>>>>>>>>>>>> expressions >>>>>>>>>>>>> method on your applications. After OCR, these specific characters >>>>>>>>>>>>> or words >>>>>>>>>>>>> will be replaced by current characters or words that you defined >>>>>>>>>>>>> in your >>>>>>>>>>>>> applications by Regular expressions. it can be done in some >>>>>>>>>>>>> major problems. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:51:29 pm UTC+6 >>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The characters are getting missed, even after fine-tuning. >>>>>>>>>>>>>> I never made any progress. I tried many different ways. Some >>>>>>>>>>>>>> specific characters are always missing from the OCR result. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:49:20 PM UTC+3 >>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> EasyOCR I think is best for ID cards or something like that >>>>>>>>>>>>>>> image process. but document images like books, here Tesseract >>>>>>>>>>>>>>> is better >>>>>>>>>>>>>>> than EasyOCR. Even I didn't use EasyOCR. you can try it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have added words of dictionaries but the result is the >>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> what kind of problem you have faced in fine-tuning in few >>>>>>>>>>>>>>> new characters as you said (*but, I failed in every >>>>>>>>>>>>>>> possible way to introduce a few new characters into the >>>>>>>>>>>>>>> database.)* >>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:33:48 pm UTC+6 >>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, we are new to this. I find the instructions (the >>>>>>>>>>>>>>>> manual) very hard to follow. The video you linked above was >>>>>>>>>>>>>>>> really helpful >>>>>>>>>>>>>>>> to get started. My plan at the beginning was to fine tune the >>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>> .traineddata. But, I failed in every possible way to introduce >>>>>>>>>>>>>>>> a few new >>>>>>>>>>>>>>>> characters into the database. That is why I started from >>>>>>>>>>>>>>>> scratch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sure, I will follow Lorenzo's suggestion: will run more the >>>>>>>>>>>>>>>> iterations, and see if I can improve. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Another areas we need to explore is usage of dictionaries >>>>>>>>>>>>>>>> actually. May be adding millions of words into the dictionary >>>>>>>>>>>>>>>> could help >>>>>>>>>>>>>>>> Tesseract. I don't have millions of words; but I am looking >>>>>>>>>>>>>>>> into some >>>>>>>>>>>>>>>> corpus to get more words into the dictionary. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If this all fails, EasyOCR (and probably other similar >>>>>>>>>>>>>>>> open-source packages) is probably our next option to try on. >>>>>>>>>>>>>>>> Sure, sharing >>>>>>>>>>>>>>>> our experiences will be helpful. I will let you know if I made >>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>> progresses in any of these options. >>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:19:48 PM UTC+3 >>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> How is your training going for Bengali? It was nearly >>>>>>>>>>>>>>>>> good but I faced space problems between two words, some words >>>>>>>>>>>>>>>>> are spaces >>>>>>>>>>>>>>>>> but most of them have no space. I think is problem is in the >>>>>>>>>>>>>>>>> dataset but I >>>>>>>>>>>>>>>>> use the default training dataset from Tesseract which is used >>>>>>>>>>>>>>>>> in Ben That >>>>>>>>>>>>>>>>> way I am confused so I have to explore more. by the way, you >>>>>>>>>>>>>>>>> can try as Lorenzo >>>>>>>>>>>>>>>>> Blz said. Actually training from scratch is harder than >>>>>>>>>>>>>>>>> fine-tuning. so you can use different datasets to explore. if >>>>>>>>>>>>>>>>> you succeed. >>>>>>>>>>>>>>>>> please let me know how you have done this whole process. I'm >>>>>>>>>>>>>>>>> also new in >>>>>>>>>>>>>>>>> this field. >>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 1:13:43 pm UTC+6 >>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> How is your training going for Bengali? >>>>>>>>>>>>>>>>>> I have been trying to train from scratch. I made about >>>>>>>>>>>>>>>>>> 64,000 lines of text (which produced about 255,000 files, in >>>>>>>>>>>>>>>>>> the end) and >>>>>>>>>>>>>>>>>> run the training for 150,000 iterations; getting 0.51 >>>>>>>>>>>>>>>>>> training error rate. >>>>>>>>>>>>>>>>>> I was hopping to get reasonable accuracy. Unfortunately, >>>>>>>>>>>>>>>>>> when I run the OCR >>>>>>>>>>>>>>>>>> using .traineddata, the accuracy is absolutely terrible. >>>>>>>>>>>>>>>>>> Do you think I >>>>>>>>>>>>>>>>>> made some mistakes, or that is an expected result? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tuesday, September 12, 2023 at 11:15:25 PM UTC+3 >>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, he doesn't mention all fonts but only one font. >>>>>>>>>>>>>>>>>>> That way he didn't use *MODEL_NAME in a separate * >>>>>>>>>>>>>>>>>>> *script **file script I think.* >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Actually, here we teach all *tif, gt.txt, and .box >>>>>>>>>>>>>>>>>>> files *which are created by *MODEL_NAME I mean **eng, >>>>>>>>>>>>>>>>>>> ben, oro flag or language code *because when we first >>>>>>>>>>>>>>>>>>> create *tif, gt.txt, and .box files, *every file starts >>>>>>>>>>>>>>>>>>> by *MODEL_NAME*. This *MODEL_NAME* we selected on >>>>>>>>>>>>>>>>>>> the training script for looping each tif, gt.txt, and .box >>>>>>>>>>>>>>>>>>> files which are >>>>>>>>>>>>>>>>>>> created by *MODEL_NAME.* >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tuesday, 12 September, 2023 at 9:42:13 pm UTC+6 >>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, I am familiar with the video and have set up the >>>>>>>>>>>>>>>>>>>> folder structure as you did. Indeed, I have tried a number >>>>>>>>>>>>>>>>>>>> of fine-tuning >>>>>>>>>>>>>>>>>>>> with a single font following Gracia's video. But, your >>>>>>>>>>>>>>>>>>>> script is much >>>>>>>>>>>>>>>>>>>> better because supports multiple fonts. The whole >>>>>>>>>>>>>>>>>>>> improvement you made is >>>>>>>>>>>>>>>>>>>> brilliant; and very useful. It is all working for me. >>>>>>>>>>>>>>>>>>>> The only part that I didn't understand is the trick you >>>>>>>>>>>>>>>>>>>> used in your tesseract_train.py script. You see, I have >>>>>>>>>>>>>>>>>>>> been doing exactly >>>>>>>>>>>>>>>>>>>> to you did except this script. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The scripts seems to have the trick of sending/teaching >>>>>>>>>>>>>>>>>>>> each of the fonts (iteratively) into the model. The script >>>>>>>>>>>>>>>>>>>> I have been >>>>>>>>>>>>>>>>>>>> using (which I get from Garcia) doesn't mention font at >>>>>>>>>>>>>>>>>>>> all. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> *TESSDATA_PREFIX=../tesseract/tessdata make training >>>>>>>>>>>>>>>>>>>> MODEL_NAME=oro TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000* >>>>>>>>>>>>>>>>>>>> Does it mean that my model does't train the fonts (even >>>>>>>>>>>>>>>>>>>> if the fonts have been included in the splitting process, >>>>>>>>>>>>>>>>>>>> in the other >>>>>>>>>>>>>>>>>>>> script)? >>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 10:54:08 AM UTC+3 >>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font namesfont_names = >>>>>>>>>>>>>>>>>>>>> ['ben']for font in font_names: command = >>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make training >>>>>>>>>>>>>>>>>>>>> MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> * subprocess.run(command, shell=True) 1 . This >>>>>>>>>>>>>>>>>>>>> command is for training data that I have named '* >>>>>>>>>>>>>>>>>>>>> tesseract_training*.py' inside tesstrain folder.* >>>>>>>>>>>>>>>>>>>>> *2. root directory means your main training folder and >>>>>>>>>>>>>>>>>>>>> inside it as like langdata, tessearact, tesstrain >>>>>>>>>>>>>>>>>>>>> folders. if you see this >>>>>>>>>>>>>>>>>>>>> tutorial * >>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=KE4xEzFGSU8 you >>>>>>>>>>>>>>>>>>>>> will understand better the folder structure. only I >>>>>>>>>>>>>>>>>>>>> created tesseract_training.py in tesstrain folder for >>>>>>>>>>>>>>>>>>>>> training and >>>>>>>>>>>>>>>>>>>>> FontList.py file is the main path as *like langdata, >>>>>>>>>>>>>>>>>>>>> tessearact, tesstrain, and *split_training_text.py. >>>>>>>>>>>>>>>>>>>>> 3. first of all you have to put all fonts in your >>>>>>>>>>>>>>>>>>>>> Linux fonts folder. /usr/share/fonts/ then run: >>>>>>>>>>>>>>>>>>>>> sudo apt update then sudo fc-cache -fv >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> after that, you have to add the exact font's name in >>>>>>>>>>>>>>>>>>>>> FontList.py file like me. >>>>>>>>>>>>>>>>>>>>> I have added two pic my folder structure. first is >>>>>>>>>>>>>>>>>>>>> main structure pic and the second is the Colopse >>>>>>>>>>>>>>>>>>>>> tesstrain folder. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I[image: Screenshot 2023-09-11 134947.png][image: >>>>>>>>>>>>>>>>>>>>> Screenshot 2023-09-11 135014.png] >>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 12:50:03 pm UTC+6 >>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you so much for putting out these brilliant >>>>>>>>>>>>>>>>>>>>>> scripts. They make the process much more efficient. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I have one more question on the other script that you >>>>>>>>>>>>>>>>>>>>>> use to train. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font namesfont_names = >>>>>>>>>>>>>>>>>>>>>> ['ben']for font in font_names: command = >>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make training >>>>>>>>>>>>>>>>>>>>>> MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>>>>>>>>>>>>>>> * subprocess.run(command, shell=True) * >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Do you have the name of fonts listed in file in the >>>>>>>>>>>>>>>>>>>>>> same/root directory? >>>>>>>>>>>>>>>>>>>>>> How do you setup the names of the fonts in the file, >>>>>>>>>>>>>>>>>>>>>> if you don't mind sharing it? >>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 4:27:27 AM UTC+3 >>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> You can use the new script below. it's better than >>>>>>>>>>>>>>>>>>>>>>> the previous two scripts. You can create *tif, >>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *by multiple fonts and also >>>>>>>>>>>>>>>>>>>>>>> use breakpoint if vs code close or anything during >>>>>>>>>>>>>>>>>>>>>>> creating *tif, >>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *then you can checkpoint to >>>>>>>>>>>>>>>>>>>>>>> navigate where you close vs code. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> command for *tif, gt.txt, and .box files * >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=None, >>>>>>>>>>>>>>>>>>>>>>> end_line=None): >>>>>>>>>>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>>>>>>>>>> with open(training_text_file, 'r') as >>>>>>>>>>>>>>>>>>>>>>> input_file: >>>>>>>>>>>>>>>>>>>>>>> lines = input_file.readlines() >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> if not os.path.exists(output_directory): >>>>>>>>>>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>>>>>>>>>> start_line = 0 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>>>>>>>>>> end_line = len(lines) - 1 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> for font_name in font_list.fonts: >>>>>>>>>>>>>>>>>>>>>>> for line_index in range(start_line, end_line >>>>>>>>>>>>>>>>>>>>>>> + 1): >>>>>>>>>>>>>>>>>>>>>>> line = lines[line_index].strip() >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> training_text_file_name = pathlib.Path( >>>>>>>>>>>>>>>>>>>>>>> training_text_file).stem >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> line_serial = f"{line_index:d}" >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> line_gt_text = os.path.join( >>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{training_text_file_name}_{ >>>>>>>>>>>>>>>>>>>>>>> line_serial}_{font_name.replace(" ", "_")}.gt.txt') >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> with open(line_gt_text, 'w') as >>>>>>>>>>>>>>>>>>>>>>> output_file: >>>>>>>>>>>>>>>>>>>>>>> output_file.writelines([line]) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> file_base_name = f'{ >>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{ >>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}' >>>>>>>>>>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>>>>>>>>>> f'--font={font_name}', >>>>>>>>>>>>>>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>>>>>>>>>>>>>> f'--outputbase={output_directory}/{ >>>>>>>>>>>>>>>>>>>>>>> file_base_name}', >>>>>>>>>>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>>>>>>>>>> '--strip_unrenderable_words', >>>>>>>>>>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>>>>>>>>>> '--ysize=330', >>>>>>>>>>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/eng.unicharset', >>>>>>>>>>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--start', type=int, >>>>>>>>>>>>>>>>>>>>>>> help='Starting >>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--end', type=int, help='Ending >>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> training_text_file = 'langdata/eng.training_text >>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>> output_directory = ' >>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/eng-ground-truth' >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, args.end) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Then create a file called "FontList" in the root >>>>>>>>>>>>>>>>>>>>>>> directory and paste it. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> class FontList: >>>>>>>>>>>>>>>>>>>>>>> def __init__(self): >>>>>>>>>>>>>>>>>>>>>>> self.fonts = [ >>>>>>>>>>>>>>>>>>>>>>> "Gerlick" >>>>>>>>>>>>>>>>>>>>>>> "Sagar Medium", >>>>>>>>>>>>>>>>>>>>>>> "Ekushey Lohit Normal", >>>>>>>>>>>>>>>>>>>>>>> "Charukola Round Head Regular, weight=433 >>>>>>>>>>>>>>>>>>>>>>> ", >>>>>>>>>>>>>>>>>>>>>>> "Charukola Round Head Bold, weight=443", >>>>>>>>>>>>>>>>>>>>>>> "Ador Orjoma Unicode", >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> then import in the above code, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> *for breakpoint command:* >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> sudo python3 split_training_text.py --start 0 --end >>>>>>>>>>>>>>>>>>>>>>> 11 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> change checkpoint according to you --start 0 --end >>>>>>>>>>>>>>>>>>>>>>> 11. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> *and training checkpoint as you know already.* >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 1:22:34 am UTC+6 >>>>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi mhalidu, >>>>>>>>>>>>>>>>>>>>>>>> the script you posted here seems much more >>>>>>>>>>>>>>>>>>>>>>>> extensive than you posted before: >>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0e2880d9-64c0-4659-b497-902a5747caf4n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I have been using your earlier script. It is >>>>>>>>>>>>>>>>>>>>>>>> magical. How is this one different from the >>>>>>>>>>>>>>>>>>>>>>>> earlier one? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thank you for posting these scripts, by the way. It >>>>>>>>>>>>>>>>>>>>>>>> has saved my countless hours; by running multiple >>>>>>>>>>>>>>>>>>>>>>>> fonts in one sweep. I was >>>>>>>>>>>>>>>>>>>>>>>> not able to find any instruction on how to train for >>>>>>>>>>>>>>>>>>>>>>>> multiple fonts. The >>>>>>>>>>>>>>>>>>>>>>>> official manual is also unclear. YOUr script helped me >>>>>>>>>>>>>>>>>>>>>>>> to get started. >>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, August 9, 2023 at 11:00:49 PM UTC+3 >>>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> ok, I will try as you said. >>>>>>>>>>>>>>>>>>>>>>>>> one more thing, what's the role of the >>>>>>>>>>>>>>>>>>>>>>>>> trained_text lines will be? I have seen Bengali text >>>>>>>>>>>>>>>>>>>>>>>>> are long words of >>>>>>>>>>>>>>>>>>>>>>>>> lines. so I wanna know how many words or characters >>>>>>>>>>>>>>>>>>>>>>>>> will be the better >>>>>>>>>>>>>>>>>>>>>>>>> choice for the train? and >>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600','--ysize=350', will be according >>>>>>>>>>>>>>>>>>>>>>>>> to words of lines? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thursday, 10 August, 2023 at 1:10:14 am UTC+6 >>>>>>>>>>>>>>>>>>>>>>>>> shree wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Include the default fonts also in your >>>>>>>>>>>>>>>>>>>>>>>>>> fine-tuning list of fonts and see if that helps. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 9, 2023, 2:27 PM Ali hussain < >>>>>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I have trained some new fonts by fine-tune >>>>>>>>>>>>>>>>>>>>>>>>>>> methods for the Bengali language in Tesseract 5 and >>>>>>>>>>>>>>>>>>>>>>>>>>> I have used all >>>>>>>>>>>>>>>>>>>>>>>>>>> official trained_text and tessdata_best and other >>>>>>>>>>>>>>>>>>>>>>>>>>> things also. everything >>>>>>>>>>>>>>>>>>>>>>>>>>> is good but the problem is the default font which >>>>>>>>>>>>>>>>>>>>>>>>>>> was trained before that >>>>>>>>>>>>>>>>>>>>>>>>>>> does not convert text like prev but my new fonts >>>>>>>>>>>>>>>>>>>>>>>>>>> work well. I don't >>>>>>>>>>>>>>>>>>>>>>>>>>> understand why it's happening. I share code based >>>>>>>>>>>>>>>>>>>>>>>>>>> to understand what going >>>>>>>>>>>>>>>>>>>>>>>>>>> on. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *codes for creating tif, gt.txt, .box files:* >>>>>>>>>>>>>>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> def read_line_count(): >>>>>>>>>>>>>>>>>>>>>>>>>>> if os.path.exists('line_count.txt'): >>>>>>>>>>>>>>>>>>>>>>>>>>> with open('line_count.txt', 'r') as >>>>>>>>>>>>>>>>>>>>>>>>>>> file: >>>>>>>>>>>>>>>>>>>>>>>>>>> return int(file.read()) >>>>>>>>>>>>>>>>>>>>>>>>>>> return 0 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> def write_line_count(line_count): >>>>>>>>>>>>>>>>>>>>>>>>>>> with open('line_count.txt', 'w') as file: >>>>>>>>>>>>>>>>>>>>>>>>>>> file.write(str(line_count)) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=None, >>>>>>>>>>>>>>>>>>>>>>>>>>> end_line=None): >>>>>>>>>>>>>>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>>>>>>>>>>>>>> with open(training_text_file, 'r') as >>>>>>>>>>>>>>>>>>>>>>>>>>> input_file: >>>>>>>>>>>>>>>>>>>>>>>>>>> for line in input_file.readlines(): >>>>>>>>>>>>>>>>>>>>>>>>>>> lines.append(line.strip()) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> if not os.path.exists(output_directory): >>>>>>>>>>>>>>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> random.shuffle(lines) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>> line_count = read_line_count() # Set >>>>>>>>>>>>>>>>>>>>>>>>>>> the starting line_count from the file >>>>>>>>>>>>>>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>>>>>>>>>>>>>> line_count = start_line >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>> end_line_count = len(lines) - 1 # Set >>>>>>>>>>>>>>>>>>>>>>>>>>> the ending line_count >>>>>>>>>>>>>>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>>>>>>>>>>>>>> end_line_count = min(end_line, len(lines) >>>>>>>>>>>>>>>>>>>>>>>>>>> - 1) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_list.fonts: # Iterate >>>>>>>>>>>>>>>>>>>>>>>>>>> through all the fonts in the font_list >>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>>>>>>>>>>>>>> for line in lines: >>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name = pathlib. >>>>>>>>>>>>>>>>>>>>>>>>>>> Path(training_text_file).stem >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> # Generate a unique serial number >>>>>>>>>>>>>>>>>>>>>>>>>>> for each line >>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial = f"{line_count:d}" >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> # GT (Ground Truth) text filename >>>>>>>>>>>>>>>>>>>>>>>>>>> line_gt_text = os.path.join( >>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{training_text_file_name}_{ >>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial}.gt.txt') >>>>>>>>>>>>>>>>>>>>>>>>>>> with open(line_gt_text, 'w') as >>>>>>>>>>>>>>>>>>>>>>>>>>> output_file: >>>>>>>>>>>>>>>>>>>>>>>>>>> output_file.writelines([line]) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> # Image filename >>>>>>>>>>>>>>>>>>>>>>>>>>> file_base_name = f'ben_{line_serial} >>>>>>>>>>>>>>>>>>>>>>>>>>> ' # Unique filename for each font >>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>>>>>>>>>>>>>> f'--font={font}', >>>>>>>>>>>>>>>>>>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>>>>>>>>>>>>>>>>>> f'--outputbase={output_directory >>>>>>>>>>>>>>>>>>>>>>>>>>> }/{file_base_name}', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--strip_unrenderable_words', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--ysize=350', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/ben.unicharset', >>>>>>>>>>>>>>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> line_count += 1 >>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial += 1 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> # Reset font_serial for the next font >>>>>>>>>>>>>>>>>>>>>>>>>>> iteration >>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> write_line_count(line_count) # Update the >>>>>>>>>>>>>>>>>>>>>>>>>>> line_count in the file >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--start', type=int, >>>>>>>>>>>>>>>>>>>>>>>>>>> help='Starting line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--end', type=int, help= >>>>>>>>>>>>>>>>>>>>>>>>>>> 'Ending line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file = ' >>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/ben.training_text' >>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory = ' >>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/ben-ground-truth' >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> # Create an instance of the FontList class >>>>>>>>>>>>>>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, args.end) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *and for training code:* >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> # List of font names >>>>>>>>>>>>>>>>>>>>>>>>>>> font_names = ['ben'] >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_names: >>>>>>>>>>>>>>>>>>>>>>>>>>> command = >>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>> make training MODEL_NAME={font} START_MODEL=ben >>>>>>>>>>>>>>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000 >>>>>>>>>>>>>>>>>>>>>>>>>>> LANG_TYPE=Indic" >>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run(command, shell=True) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> any suggestion to identify to extract the >>>>>>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>>>>>> thanks, everyone >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>> You received this message because you are >>>>>>>>>>>>>>>>>>>>>>>>>>> subscribed to the Google Groups "tesseract-ocr" >>>>>>>>>>>>>>>>>>>>>>>>>>> group. >>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop >>>>>>>>>>>>>>>>>>>>>>>>>>> receiving emails from it, send an email to >>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/949aa119-6aaf-4764-9c4e-0e32af47ee8bn%40googlegroups.com.