For images. I have to create my own trainneddata for my images. So for that I am following steps mentioned in this documentation https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html As per the steps I have created box file, lstm file and unicharset file. And next step is to create traineddata using tesstrain.sh followed by the next step i.e. lstmtraining.exe . I am getting such errors while performing at step tesstrain.sh.
On Wednesday, September 1, 2021 at 6:11:27 PM UTC+5:30 P007 wrote: > I mean working with font only? > Or images?? > > On Wed, 1 Sep 2021 at 6:09 PM, Samruddhi Dhake <[email protected]> wrote: > >> Yes, I am working for eng language. >> I am using tessdata.(C:\Program Files\Tesseract-OCR\tessdata) >> >> On Wednesday, September 1, 2021 at 5:57:24 PM UTC+5:30 P007 wrote: >> >>> Okay, >>> >>> Wait you are working for English language right? >>> What kind of dataset you used here. >>> >>> On Wed, 1 Sep 2021 at 5:53 PM, Samruddhi Dhake <[email protected]> >>> wrote: >>> >>>> No. Tessstrain.sh didn't work. I am running tesstrain.sh on cygwin. >>>> Command-> >>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang eng >>>> --linedata_only --noextract_font_properties --langdata_dir 'C:/Program >>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program >>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata --fontlist >>>> 'Arial'* >>>> >>>> After hitting enter for tesstrain.sh, it is processing text2image and >>>> giving following error >>>> === Starting training for language 'eng' >>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program >>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 >>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I >>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing >>>> Fontconfig error: Cannot load default config file >>>> Could not find font named 'Arial'. >>>> Please correct --font arg. >>>> ERROR: Program Program failed. Abort. >>>> >>>> As per previous suggestions, I ran text2image.exe command on cmd and >>>> its working and giving me all available fonts. >>>> >>>> Then after running tesstrain.sh, why text2image command is failing and >>>> it is not creating tempfolder under /tmp/ and I am getting fonts.config >>>> error. >>>> It is expected that fonts.config file which gets created in >>>> tempfolder(here in my case font_tmp.0doGBqWc3I) should gets written and it >>>> should include font 'Arial' and then Arial font can be found. >>>> Don't why it is not creating.. >>>> >>>> Regards, >>>> Samruddhi >>>> >>>> On Wednesday, September 1, 2021 at 5:31:10 PM UTC+5:30 P007 wrote: >>>> >>>>> >>>>> Tesstrain.sh work for you ? >>>>> >>>>> On Wed, 1 Sep 2021 at 5:09 PM, Samruddhi Dhake <[email protected]> >>>>> wrote: >>>>> >>>>>> In this text2image, there is an rgument --fontconfig_tempdir which >>>>>> creates temp folder where fonts.conf gets added. >>>>>> >>>>>> I checked /tmp/, no other tempfolder is created( font_tmp.0doGBqWc3I) >>>>>> >>>>>> Has anybody this issue? >>>>>> >>>>>> Regards, >>>>>> Samruddhi >>>>>> >>>>>> On Tuesday, August 31, 2021 at 7:24:46 PM UTC+5:30 Samruddhi Dhake >>>>>> wrote: >>>>>> >>>>>>> >"C:\Program Files\Tesseract-OCR\text2image.exe" >>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp >>>>>>> --list_available_fonts >>>>>>> This worked. I got list of available fonts which contains Arial and >>>>>>> Arial Bold too. >>>>>>> >>>>>>> Now this time,in Cygwin Bash, I tried giving --fontlist 'Arial' for >>>>>>> tesstrain.sh >>>>>>> Command-> >>>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang >>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>> 'C:/Program >>>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program >>>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata >>>>>>> --fontlist >>>>>>> 'Arial'* >>>>>>> >>>>>>> === Starting training for language 'eng' >>>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program >>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 >>>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt >>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I >>>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing >>>>>>> Fontconfig error: Cannot load default config file >>>>>>> Could not find font named 'Arial'. >>>>>>> Please correct --font arg. >>>>>>> ERROR: Program Program failed. Abort. >>>>>>> >>>>>>> Still I am getting this font.conf error. Any idea how to resolve >>>>>>> this font.conf error? >>>>>>> >>>>>>> Regards, >>>>>>> Samruddhi >>>>>>> >>>>>>> On Tuesday, August 31, 2021 at 4:50:14 PM UTC+5:30 zdenop wrote: >>>>>>> >>>>>>>> try run this: >>>>>>>> "C:\Program Files\Tesseract-OCR\text2image.exe" >>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp >>>>>>>> --list_available_fonts >>>>>>>> >>>>>>>> Zdenko >>>>>>>> >>>>>>>> >>>>>>>> po 30. 8. 2021 o 16:45 Samruddhi Dhake <[email protected]> >>>>>>>> napísal(a): >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am running command -> >>>>>>>>> >>>>>>>>> ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang >>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir >>>>>>>>> "C:/Program >>>>>>>>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program >>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>>>>>>>> >>>>>>>>> And after hitting enter -> (processing) >>>>>>>>> === *Starting training for language 'eng'* >>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program >>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ --ptsize >>>>>>>>> 12 >>>>>>>>> --font=Arial Bold >>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* >>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing* >>>>>>>>> *Fontconfig error: Cannot load default config file* >>>>>>>>> *Could not find font named 'Arial Bold'.* >>>>>>>>> *Please correct --font arg.* >>>>>>>>> *ERROR: Program Program failed. Abort.* >>>>>>>>> >>>>>>>>> >>>>>>>>> I will break it to ask few queries. >>>>>>>>> >>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program >>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ --ptsize >>>>>>>>> 12 >>>>>>>>> --font=Arial Bold >>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt >>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS* >>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing* >>>>>>>>> ----> Here, I am not giving input as Arial Bold. Outputbase , this >>>>>>>>> should create temp folder 'font_tmp.s9cdSHrzKS' but its not creating. >>>>>>>>> And so does fontconfig_tmpdir'. So it is giving writing error >>>>>>>>> >>>>>>>>> *Fontconfig error: Cannot load default config file* >>>>>>>>> ----> To resolve this error, I added >>>>>>>>> FONTCONFIG_FILE=%WINDIR%\fonts.conf to environment >>>>>>>>> variables(referring >>>>>>>>> https://forums.wesnoth.org/viewtopic.php?t=22821) >>>>>>>>> But still not resolved. >>>>>>>>> >>>>>>>>> I was checking-> *text2image.exe ----list_available_fonts* >>>>>>>>> And after hitting enter, I got -> Fontconfig warning: >>>>>>>>> "/tmp\fonts.conf", line 4: empty font directory name ignored >>>>>>>>> >>>>>>>>> The contents of the fonts.conf file which gets created are-> >>>>>>>>> <?xml version="1.0"?> >>>>>>>>> <!DOCTYPE fontconfig SYSTEM "fonts.dtd"> >>>>>>>>> <fontconfig> >>>>>>>>> <dir></dir> >>>>>>>>> <cachedir>/tmp</cachedir> >>>>>>>>> <config></config> >>>>>>>>> </fontconfig> >>>>>>>>> >>>>>>>>> Can you please help me how can this be resolved? Or Am I giving >>>>>>>>> correct tesstrain.sh command with its args? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Samruddhi >>>>>>>>> On Monday, August 30, 2021 at 5:12:21 PM UTC+5:30 zdenop wrote: >>>>>>>>> >>>>>>>>>> First of all: use quotes for multi word names, or escape >>>>>>>>>> space/special symbols (e.g. --font="Arial Bold") >>>>>>>>>> Next: fix error message: "Unable to open >>>>>>>>>> '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing" >>>>>>>>>> Next: check available font for text2image with option >>>>>>>>>> --list_available_fonts >>>>>>>>>> etc... >>>>>>>>>> >>>>>>>>>> PS: I would suggest using linux for training instead of windows >>>>>>>>>> (e.g. in WSL[1]) >>>>>>>>>> [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10 >>>>>>>>>> >>>>>>>>>> Zdenko >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> po 30. 8. 2021 o 12:12 Samruddhi Dhake <[email protected]> >>>>>>>>>> napísal(a): >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Text2Image error is gone. I am getting *font-config error*. >>>>>>>>>>> >>>>>>>>>>> SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR >>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>> --langdata_dir >>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir >>>>>>>>>>> "C:/Program >>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata >>>>>>>>>>> Creating new directory D:Testtrainneddata >>>>>>>>>>> >>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>> [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program >>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts >>>>>>>>>>> --ptsize 12 >>>>>>>>>>> --font=Arial Bold >>>>>>>>>>> --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>>>>>>>>>> --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt >>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX >>>>>>>>>>> Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing >>>>>>>>>>> Fontconfig error: Cannot load default config file >>>>>>>>>>> Could not find font named 'Arial Bold'. >>>>>>>>>>> Please correct --font arg. >>>>>>>>>>> ERROR: Program Program failed. Abort. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have Arial Bold font on my machine. Don't know why it cannot >>>>>>>>>>> find. And in /tmp/ folder there is no font_tmp.hbC9F3LEQX where >>>>>>>>>>> fonts.conf >>>>>>>>>>> cannot be opened for writing. >>>>>>>>>>> How can I resolve this? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Samruddhi >>>>>>>>>>> >>>>>>>>>>> On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Honestly, I have no clue what you are doing: text2image is at >>>>>>>>>>>> the same location as the tesseract executable. So if you have >>>>>>>>>>>> tesseract in >>>>>>>>>>>> the path, text2image must work too. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [image: image.png] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Zdenko >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> st 25. 8. 2021 o 16:26 Samruddhi Dhake <[email protected]> >>>>>>>>>>>> napísal(a): >>>>>>>>>>>> >>>>>>>>>>>>> As you suggested, I installed Tesseract v5.0.0 on my Windows >>>>>>>>>>>>> machine (Index of /tesseract (uni-mannheim.de) >>>>>>>>>>>>> <https://digi.bib.uni-mannheim.de/tesseract/>). This included >>>>>>>>>>>>> training tools too. >>>>>>>>>>>>> I performed all the previous steps(boxfile, lstmf >>>>>>>>>>>>> file,unicharset) >>>>>>>>>>>>> >>>>>>>>>>>>> But still after running tesstrain.sh command in Cygwin, I am >>>>>>>>>>>>> getting following error, >>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts >>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir >>>>>>>>>>>>> "C:/Program >>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir >>>>>>>>>>>>> D:/Bugs/1206806/folder/trainneddata >>>>>>>>>>>>> Creating new directory D:/Bugs/1206806/folder/trainneddata >>>>>>>>>>>>> >>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>> which: no text2image in >>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>>> Management Engine >>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>> >>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>>> >>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows >>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>> Server/Client >>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>> >>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files >>>>>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files (x86)/Microsoft >>>>>>>>>>>>> SQL >>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>> Files >>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>>> >>>>>>>>>>>>> Am I missing something? Can you please guild me? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi >>>>>>>>>>>>> Dhake wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you please provide link for steps to install Tesseract >>>>>>>>>>>>>> and training tools on Windows? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 Samruddhi >>>>>>>>>>>>>> Dhake wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> How to install tesseract and training tools on Windows? >>>>>>>>>>>>>>> Do I have to install Tesseract Windows exe? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So there are only 2 possibilities: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Install tesseract and training tools >>>>>>>>>>>>>>>> 2. Learn how to handle & use not installed sw. This >>>>>>>>>>>>>>>> option is not related to tesseract. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <[email protected]> >>>>>>>>>>>>>>>> napísal(a): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I haven't installed Tesseract. I have kept in a folder and >>>>>>>>>>>>>>>>> I am running exe by giving its path. I have generated >>>>>>>>>>>>>>>>> training tools >>>>>>>>>>>>>>>>> through source code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To create box file, command->(I gave absoulute path of >>>>>>>>>>>>>>>>> tesseract.exe) >>>>>>>>>>>>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To create box file, command-> >>>>>>>>>>>>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To create unicharset, command-> >>>>>>>>>>>>>>>>> unicharset_extractor.exe --output_unicharset >>>>>>>>>>>>>>>>> ..\own.unicharset ..\langdata\eng\eng.training_text >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And to create trainned data, using tesstrain.sh command, >>>>>>>>>>>>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts >>>>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties >>>>>>>>>>>>>>>>> --langdata_dir >>>>>>>>>>>>>>>>> langdata --tessdata_dir tessdata --output_dir trainneddata >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30 >>>>>>>>>>>>>>>>> Samruddhi Dhake wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have generated training tools through source code. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 zdenop >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How did you install tesseract? Did you also install >>>>>>>>>>>>>>>>>>> training tools? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Zdenko >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake < >>>>>>>>>>>>>>>>>>> [email protected]> napísal(a): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I am creating my own trainneddata using tesseract >>>>>>>>>>>>>>>>>>>> v4.1.1 on Windows 10. >>>>>>>>>>>>>>>>>>>> I am referring documentation >>>>>>>>>>>>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have successfully created .box file and .lstmf file >>>>>>>>>>>>>>>>>>>> using lstmbox and lstm.train respectively. >>>>>>>>>>>>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh >>>>>>>>>>>>>>>>>>>> command to create training data. >>>>>>>>>>>>>>>>>>>> But I am getting below error. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir >>>>>>>>>>>>>>>>>>>> C:/Windows/Fonts --lang eng --linedata_only >>>>>>>>>>>>>>>>>>>> --noextract_font_properties >>>>>>>>>>>>>>>>>>>> --langdata_dir ./langdata --tessdata_dir ./tessdata >>>>>>>>>>>>>>>>>>>> --output_dir >>>>>>>>>>>>>>>>>>>> ./trainneddata >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> === Starting training for language 'eng' >>>>>>>>>>>>>>>>>>>> which: no text2image in >>>>>>>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/NVIDIA >>>>>>>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Intel/Intel(R) >>>>>>>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Intel/Intel(R) >>>>>>>>>>>>>>>>>>>> Management Engine >>>>>>>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/100/DTS/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web >>>>>>>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Microsoft >>>>>>>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Windows >>>>>>>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 >>>>>>>>>>>>>>>>>>>> 6.0.20/bin:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/Client >>>>>>>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/150/DTS/Binn:/cygdrive/c/Program Files/Microsoft >>>>>>>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program >>>>>>>>>>>>>>>>>>>> Files >>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/140/DTS/Binn:/cygdrive/c/Program Files >>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL >>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools) >>>>>>>>>>>>>>>>>>>> which: no text2image in (./api) >>>>>>>>>>>>>>>>>>>> which: no text2image in (./training) >>>>>>>>>>>>>>>>>>>> ERROR: 'text2image' not found >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I found text2image comes after running command 'make >>>>>>>>>>>>>>>>>>>> training'. >>>>>>>>>>>>>>>>>>>> Can you please help me how this can be done in WIndows >>>>>>>>>>>>>>>>>>>> 10? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Samruddhi >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving >>>>>>>>>>>>>>>>>>>> emails from it, send an email to >>>>>>>>>>>>>>>>>>>> [email protected]. >>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>>> from it, send an email to [email protected] >>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>>> >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com >>>>>>>>>>> >>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> >>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>> >>>>>>>> >>>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>> >>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com.

