For images.
I have to create my own trainneddata for my images. So for that I am 
following steps mentioned in this 
documentation 
https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html
As per the steps I have created box file, lstm file and unicharset file. 
And next step is to create traineddata using tesstrain.sh followed by the 
next step i.e. lstmtraining.exe .
I am getting such errors while performing at step tesstrain.sh.

On Wednesday, September 1, 2021 at 6:11:27 PM UTC+5:30 P007 wrote:

> I mean working with font only?  
> Or images??
>
> On Wed, 1 Sep 2021 at 6:09 PM, Samruddhi Dhake <[email protected]> wrote:
>
>> Yes, I am working for eng language.
>> I am using tessdata.(C:\Program Files\Tesseract-OCR\tessdata)
>>
>> On Wednesday, September 1, 2021 at 5:57:24 PM UTC+5:30 P007 wrote:
>>
>>> Okay, 
>>>
>>> Wait you are working for English language right?
>>> What kind of dataset you used here.
>>>
>>> On Wed, 1 Sep 2021 at 5:53 PM, Samruddhi Dhake <[email protected]> 
>>> wrote:
>>>
>>>> No. Tessstrain.sh didn't work. I am running tesstrain.sh on cygwin.
>>>>  Command->
>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang eng 
>>>> --linedata_only --noextract_font_properties --langdata_dir 'C:/Program 
>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program 
>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata --fontlist 
>>>> 'Arial'*
>>>>
>>>> After hitting enter for tesstrain.sh, it is processing text2image and 
>>>> giving following error
>>>> === Starting training for language 'eng'
>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program 
>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 
>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt 
>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt 
>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I
>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing
>>>> Fontconfig error: Cannot load default config file
>>>> Could not find font named 'Arial'.
>>>> Please correct --font arg.
>>>> ERROR: Program Program failed. Abort.
>>>>
>>>> As per previous suggestions, I ran text2image.exe command on cmd and 
>>>> its working and giving me all available fonts.
>>>>
>>>> Then after running tesstrain.sh, why text2image command is failing and 
>>>> it is not creating tempfolder under /tmp/ and I am getting fonts.config 
>>>> error.
>>>> It is expected that fonts.config file which gets created in 
>>>> tempfolder(here in my case font_tmp.0doGBqWc3I) should gets written and it 
>>>> should include font 'Arial' and then Arial font can be found.
>>>> Don't why it is not creating..
>>>>
>>>> Regards,
>>>> Samruddhi
>>>>
>>>> On Wednesday, September 1, 2021 at 5:31:10 PM UTC+5:30 P007 wrote:
>>>>
>>>>>
>>>>> Tesstrain.sh work for you ?
>>>>>
>>>>> On Wed, 1 Sep 2021 at 5:09 PM, Samruddhi Dhake <[email protected]> 
>>>>> wrote:
>>>>>
>>>>>> In this text2image, there is an rgument --fontconfig_tempdir which 
>>>>>> creates temp folder where fonts.conf gets added.
>>>>>>
>>>>>> I checked /tmp/, no other tempfolder is created( font_tmp.0doGBqWc3I)
>>>>>>
>>>>>> Has anybody this issue?
>>>>>>
>>>>>> Regards,
>>>>>> Samruddhi
>>>>>>
>>>>>> On Tuesday, August 31, 2021 at 7:24:46 PM UTC+5:30 Samruddhi Dhake 
>>>>>> wrote:
>>>>>>
>>>>>>> >"C:\Program Files\Tesseract-OCR\text2image.exe" 
>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp 
>>>>>>> --list_available_fonts
>>>>>>> This worked. I got list of available fonts which contains Arial and 
>>>>>>> Arial Bold too.
>>>>>>>
>>>>>>> Now this time,in Cygwin Bash, I tried giving --fontlist 'Arial' for 
>>>>>>> tesstrain.sh
>>>>>>> Command->
>>>>>>> *$ ./src/training/tesstrain.sh --fonts_dir %WINDIR%/Fonts/ --lang 
>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir 
>>>>>>> 'C:/Program 
>>>>>>> Files/Tesseract-OCR/langdata' --tessdata_dir 'C:/Program 
>>>>>>> Files/Tesseract-OCR/tessdata' --output_dir D:/Test/trainneddata 
>>>>>>> --fontlist 
>>>>>>> 'Arial'*
>>>>>>>
>>>>>>> === Starting training for language 'eng'
>>>>>>> [Tue Aug 31 19:19:05 IST 2021] /cygdrive/c/Program 
>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=%WINDIR%/Fonts/ --ptsize 12 
>>>>>>> --font=Arial --outputbase=/tmp/font_tmp.0doGBqWc3I/sample_text.txt 
>>>>>>> --text=/tmp/font_tmp.0doGBqWc3I/sample_text.txt 
>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.0doGBqWc3I
>>>>>>> Unable to open '/tmp/font_tmp.0doGBqWc3I/fonts.conf' for writing
>>>>>>> Fontconfig error: Cannot load default config file
>>>>>>> Could not find font named 'Arial'.
>>>>>>> Please correct --font arg.
>>>>>>> ERROR: Program Program failed. Abort.
>>>>>>>
>>>>>>> Still I am getting this font.conf error. Any idea how to resolve 
>>>>>>> this font.conf error?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Samruddhi
>>>>>>>
>>>>>>> On Tuesday, August 31, 2021 at 4:50:14 PM UTC+5:30 zdenop wrote:
>>>>>>>
>>>>>>>> try run this:
>>>>>>>> "C:\Program Files\Tesseract-OCR\text2image.exe" 
>>>>>>>> --fonts_dir=%WINDIR%/Fonts --fontconfig_tmpdir=/tmp 
>>>>>>>> --list_available_fonts
>>>>>>>>
>>>>>>>> Zdenko
>>>>>>>>
>>>>>>>>
>>>>>>>> po 30. 8. 2021 o 16:45 Samruddhi Dhake <[email protected]> 
>>>>>>>> napísal(a):
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am running command ->
>>>>>>>>>
>>>>>>>>> ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts --lang 
>>>>>>>>> eng --linedata_only --noextract_font_properties --langdata_dir 
>>>>>>>>> "C:/Program 
>>>>>>>>> Files/Tesseract-OCR/langdata" --tessdata_dir "C:/Program 
>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata
>>>>>>>>>
>>>>>>>>> And after hitting enter -> (processing)
>>>>>>>>> === *Starting training for language 'eng'*
>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program 
>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ --ptsize 
>>>>>>>>> 12 
>>>>>>>>> --font=Arial Bold 
>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt 
>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt 
>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS*
>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing*
>>>>>>>>> *Fontconfig error: Cannot load default config file*
>>>>>>>>> *Could not find font named 'Arial Bold'.*
>>>>>>>>> *Please correct --font arg.*
>>>>>>>>> *ERROR: Program Program failed. Abort.*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I will break it to ask few queries.
>>>>>>>>>
>>>>>>>>> *[Mon Aug 30 16:51:10 IST 2021] /cygdrive/c/Program 
>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts/ --ptsize 
>>>>>>>>> 12 
>>>>>>>>> --font=Arial Bold 
>>>>>>>>> --outputbase=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt 
>>>>>>>>> --text=/tmp/font_tmp.s9cdSHrzKS/sample_text.txt 
>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.s9cdSHrzKS*
>>>>>>>>> *Unable to open '/tmp/font_tmp.s9cdSHrzKS/fonts.conf' for writing*
>>>>>>>>> ----> Here, I am not giving input as Arial Bold. Outputbase , this 
>>>>>>>>> should create temp folder 'font_tmp.s9cdSHrzKS' but its not creating.
>>>>>>>>> And so does fontconfig_tmpdir'. So it is giving writing error
>>>>>>>>>
>>>>>>>>> *Fontconfig error: Cannot load default config file*
>>>>>>>>> ----> To resolve this error, I added 
>>>>>>>>> FONTCONFIG_FILE=%WINDIR%\fonts.conf to environment 
>>>>>>>>> variables(referring 
>>>>>>>>> https://forums.wesnoth.org/viewtopic.php?t=22821) 
>>>>>>>>> But still not resolved.
>>>>>>>>>
>>>>>>>>> I was checking-> *text2image.exe ----list_available_fonts*
>>>>>>>>> And after hitting enter, I got -> Fontconfig warning: 
>>>>>>>>> "/tmp\fonts.conf", line 4: empty font directory name ignored
>>>>>>>>>
>>>>>>>>> The contents of the fonts.conf file which gets created are->
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
>>>>>>>>> <fontconfig>
>>>>>>>>> <dir></dir>
>>>>>>>>> <cachedir>/tmp</cachedir>
>>>>>>>>> <config></config>
>>>>>>>>> </fontconfig>
>>>>>>>>>
>>>>>>>>> Can you please help me how can this be resolved? Or Am I giving 
>>>>>>>>> correct tesstrain.sh command with its args?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Samruddhi
>>>>>>>>> On Monday, August 30, 2021 at 5:12:21 PM UTC+5:30 zdenop wrote:
>>>>>>>>>
>>>>>>>>>> First of all: use quotes for multi word names, or escape 
>>>>>>>>>> space/special symbols (e.g. --font="Arial Bold")
>>>>>>>>>> Next: fix error message: "Unable to open 
>>>>>>>>>> '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing" 
>>>>>>>>>> Next: check available font for text2image with option 
>>>>>>>>>> --list_available_fonts
>>>>>>>>>> etc...
>>>>>>>>>>
>>>>>>>>>> PS: I would suggest using linux for training instead of windows 
>>>>>>>>>> (e.g. in WSL[1])
>>>>>>>>>> [1] https://docs.microsoft.com/en-us/windows/wsl/install-win10
>>>>>>>>>>
>>>>>>>>>> Zdenko
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> po 30. 8. 2021 o 12:12 Samruddhi Dhake <[email protected]> 
>>>>>>>>>> napísal(a):
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Text2Image error is gone. I am getting *font-config error*.
>>>>>>>>>>>
>>>>>>>>>>> SDE26@DTP-SDE26-IND /cygdrive/c/Program Files/Tesseract-OCR
>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts 
>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties 
>>>>>>>>>>> --langdata_dir 
>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir 
>>>>>>>>>>> "C:/Program 
>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir D:\Test\trainneddata
>>>>>>>>>>> Creating new directory D:Testtrainneddata
>>>>>>>>>>>
>>>>>>>>>>> === Starting training for language 'eng'
>>>>>>>>>>> [Mon Aug 30 15:34:53 IST 2021] /cygdrive/c/Program 
>>>>>>>>>>> Files/Tesseract-OCR/text2image --fonts_dir=C:/Windows/Fonts 
>>>>>>>>>>> --ptsize 12 
>>>>>>>>>>> --font=Arial Bold 
>>>>>>>>>>> --outputbase=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt 
>>>>>>>>>>> --text=/tmp/font_tmp.hbC9F3LEQX/sample_text.txt 
>>>>>>>>>>> --fontconfig_tmpdir=/tmp/font_tmp.hbC9F3LEQX
>>>>>>>>>>> Unable to open '/tmp/font_tmp.hbC9F3LEQX/fonts.conf' for writing
>>>>>>>>>>> Fontconfig error: Cannot load default config file
>>>>>>>>>>> Could not find font named 'Arial Bold'.
>>>>>>>>>>> Please correct --font arg.
>>>>>>>>>>> ERROR: Program Program failed. Abort.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have Arial Bold font on my machine. Don't know why it cannot 
>>>>>>>>>>> find. And in /tmp/ folder there is no font_tmp.hbC9F3LEQX where 
>>>>>>>>>>> fonts.conf 
>>>>>>>>>>> cannot be opened for writing.
>>>>>>>>>>> How can I resolve this?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Samruddhi
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday, August 25, 2021 at 8:18:47 PM UTC+5:30 zdenop 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Honestly, I have no clue what you are doing: text2image is at 
>>>>>>>>>>>> the same location as the tesseract executable. So if you have 
>>>>>>>>>>>> tesseract in 
>>>>>>>>>>>> the path, text2image must work too. 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> st 25. 8. 2021 o 16:26 Samruddhi Dhake <[email protected]> 
>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>
>>>>>>>>>>>>> As you suggested, I installed Tesseract v5.0.0 on my Windows 
>>>>>>>>>>>>> machine  (Index of /tesseract (uni-mannheim.de) 
>>>>>>>>>>>>> <https://digi.bib.uni-mannheim.de/tesseract/>). This included 
>>>>>>>>>>>>> training tools too.
>>>>>>>>>>>>> I performed all the previous steps(boxfile, lstmf 
>>>>>>>>>>>>> file,unicharset)
>>>>>>>>>>>>>
>>>>>>>>>>>>> But still after running tesstrain.sh command in Cygwin, I am 
>>>>>>>>>>>>> getting following error,
>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir C:/Windows/Fonts 
>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties 
>>>>>>>>>>>>> --langdata_dir 
>>>>>>>>>>>>> "C:/Program Files/Tesseract-OCR/langdata" --tessdata_dir 
>>>>>>>>>>>>> "C:/Program 
>>>>>>>>>>>>> Files/Tesseract-OCR/tessdata" --output_dir 
>>>>>>>>>>>>> D:/Bugs/1206806/folder/trainneddata
>>>>>>>>>>>>> Creating new directory D:/Bugs/1206806/folder/trainneddata
>>>>>>>>>>>>>
>>>>>>>>>>>>> === Starting training for language 'eng'
>>>>>>>>>>>>> which: no text2image in 
>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/Microsoft 
>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/NVIDIA 
>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/Intel/Intel(R) 
>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Intel/Intel(R) 
>>>>>>>>>>>>> Management Engine 
>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Microsoft SQL Server/100/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Microsoft SQL Server/100/DTS/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web 
>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL 
>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files (x86)/Windows 
>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Microsoft 
>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files (x86)/Windows 
>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 6.0.20/bin:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL 
>>>>>>>>>>>>> Server/Client 
>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Microsoft SQL Server/150/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files 
>>>>>>>>>>>>> (x86)/Microsoft SQL Server/150/DTS/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files/Microsoft 
>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files (x86)/Microsoft 
>>>>>>>>>>>>> SQL 
>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>> Files 
>>>>>>>>>>>>> (x86)/Microsoft SQL Server/140/DTS/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools)
>>>>>>>>>>>>> which: no text2image in (./api)
>>>>>>>>>>>>> which: no text2image in (./training)
>>>>>>>>>>>>> ERROR: 'text2image' not found
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am I missing something? Can you please guild me?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 5:59:49 PM UTC+5:30 Samruddhi 
>>>>>>>>>>>>> Dhake wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please provide link for steps to install Tesseract 
>>>>>>>>>>>>>> and training tools on Windows?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:42:48 PM UTC+5:30 Samruddhi 
>>>>>>>>>>>>>> Dhake wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How to install tesseract and training tools on Windows? 
>>>>>>>>>>>>>>> Do I have to install Tesseract Windows exe?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 3:20:37 PM UTC+5:30 zdenop 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So there are only 2 possibilities:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    1. Install tesseract and training tools
>>>>>>>>>>>>>>>>    2. Learn how to handle & use not installed sw. This 
>>>>>>>>>>>>>>>>    option is not related to tesseract.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ut 24. 8. 2021 o 9:17 Samruddhi Dhake <[email protected]> 
>>>>>>>>>>>>>>>> napísal(a):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I haven't installed Tesseract. I have kept in a folder and 
>>>>>>>>>>>>>>>>> I am running exe by giving its path. I have generated 
>>>>>>>>>>>>>>>>> training tools 
>>>>>>>>>>>>>>>>> through source code.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To create box file, command->(I gave absoulute path of 
>>>>>>>>>>>>>>>>> tesseract.exe)
>>>>>>>>>>>>>>>>> ..\tesseract.exe Dim4.tif Dim4 lstmbox
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To create box file, command->
>>>>>>>>>>>>>>>>> tesseract.exe Dim4.tif Dim4 lstm.train
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To create unicharset, command->
>>>>>>>>>>>>>>>>> unicharset_extractor.exe --output_unicharset 
>>>>>>>>>>>>>>>>> ..\own.unicharset ..\langdata\eng\eng.training_text
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And to create trainned data, using tesstrain.sh command,
>>>>>>>>>>>>>>>>> .\src\training\tesstrain.sh --fonts_dir C:\Windows\Fonts 
>>>>>>>>>>>>>>>>> --lang eng --linedata_only --noextract_font_properties 
>>>>>>>>>>>>>>>>> --langdata_dir 
>>>>>>>>>>>>>>>>> langdata --tessdata_dir tessdata --output_dir trainneddata
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>>>> On Tuesday, August 24, 2021 at 12:24:29 PM UTC+5:30 
>>>>>>>>>>>>>>>>> Samruddhi Dhake wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have generated training tools through source code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Monday, August 23, 2021 at 7:09:02 PM UTC+5:30 zdenop 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How did you install tesseract? Did you also install 
>>>>>>>>>>>>>>>>>>> training tools?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> po 23. 8. 2021 o 15:34 Samruddhi Dhake <
>>>>>>>>>>>>>>>>>>> [email protected]> napísal(a):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I am creating my own trainneddata using tesseract 
>>>>>>>>>>>>>>>>>>>> v4.1.1 on Windows 10.
>>>>>>>>>>>>>>>>>>>> I am referring documentation 
>>>>>>>>>>>>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have successfully created .box file and .lstmf file 
>>>>>>>>>>>>>>>>>>>> using lstmbox and lstm.train respectively.
>>>>>>>>>>>>>>>>>>>> So next step, I installed Cygwin to run tesstrain.sh 
>>>>>>>>>>>>>>>>>>>> command to create training data.
>>>>>>>>>>>>>>>>>>>> But I am getting below error.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> $ ./src/training/tesstrain.sh --fonts_dir 
>>>>>>>>>>>>>>>>>>>> C:/Windows/Fonts --lang eng --linedata_only 
>>>>>>>>>>>>>>>>>>>> --noextract_font_properties 
>>>>>>>>>>>>>>>>>>>> --langdata_dir ./langdata --tessdata_dir ./tessdata 
>>>>>>>>>>>>>>>>>>>> --output_dir 
>>>>>>>>>>>>>>>>>>>> ./trainneddata
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> === Starting training for language 'eng'
>>>>>>>>>>>>>>>>>>>> which: no text2image in 
>>>>>>>>>>>>>>>>>>>> (/usr/local/bin:/usr/bin:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Microsoft 
>>>>>>>>>>>>>>>>>>>> MPI/Bin:/cygdrive/c/buildtools:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/NVIDIA 
>>>>>>>>>>>>>>>>>>>> Corporation/PhysX/Common:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Intel/Intel(R) 
>>>>>>>>>>>>>>>>>>>> Management Engine Components/iCLS:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Intel/Intel(R) 
>>>>>>>>>>>>>>>>>>>> Management Engine 
>>>>>>>>>>>>>>>>>>>> Components/iCLS:/cygdrive/c/Python25:/cygdrive/c/ProgramData/Oracle/Java/javapath:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Oracle12C_64bCli/client_1/bin:/cygdrive/c/Oracle12C_32bCli/client_1/bin:/cygdrive/c/windows/system32:/cygdrive/c/windows:/cygdrive/c/windows/System32/Wbem:/cygdrive/c/windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/windows/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/100/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/100/DTS/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Microsoft/Web Platform Installer:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft ASP.NET/ASP.NET Web 
>>>>>>>>>>>>>>>>>>>> Pages/v1.0:/cygdrive/c/Program Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/110/Tools/Binn:/cygdrive/c/windows/system32/config/systemprofile/.dnx/bin:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> Files/Microsoft DNX/Dnvm:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Windows 
>>>>>>>>>>>>>>>>>>>> Kits/8.1/Windows Performance Toolkit:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Microsoft 
>>>>>>>>>>>>>>>>>>>> SQL Server/130/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Windows 
>>>>>>>>>>>>>>>>>>>> Kits/10/Windows Performance Toolkit:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Oracle/Berkeley DB 12cR1 
>>>>>>>>>>>>>>>>>>>> 6.0.20/bin:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/dotnet:/cygdrive/c/Program Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/Client 
>>>>>>>>>>>>>>>>>>>> SDK/ODBC/170/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/IncrediBuild:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/WINDOWS/System32/WindowsPowerShell/v1.0:/cygdrive/c/WINDOWS/System32/OpenSSH:/cygdrive/c/Program
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> Files (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/150/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/150/DTS/Binn:/cygdrive/c/Program Files/Microsoft 
>>>>>>>>>>>>>>>>>>>> SQL Server/150/DTS/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/Client SDK/ODBC/130/Tools/Binn:/cygdrive/c/Program 
>>>>>>>>>>>>>>>>>>>> Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/140/DTS/Binn:/cygdrive/c/Program Files 
>>>>>>>>>>>>>>>>>>>> (x86)/Microsoft SQL 
>>>>>>>>>>>>>>>>>>>> Server/140/Tools/Binn/ManagementStudio:/cygdrive/d/Git/cmd:/cygdrive/c/Users/sde26/AppData/Local/Microsoft/WindowsApps:/cygdrive/c/Users/sde26/.dotnet/tools)
>>>>>>>>>>>>>>>>>>>> which: no text2image in (./api)
>>>>>>>>>>>>>>>>>>>> which: no text2image in (./training)
>>>>>>>>>>>>>>>>>>>> ERROR: 'text2image' not found
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I found text2image comes after running command 'make 
>>>>>>>>>>>>>>>>>>>> training'.
>>>>>>>>>>>>>>>>>>>> Can you please help me how this can be done in WIndows 
>>>>>>>>>>>>>>>>>>>> 10?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>> Samruddhi
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to 
>>>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving 
>>>>>>>>>>>>>>>>>>>> emails from it, send an email to 
>>>>>>>>>>>>>>>>>>>> [email protected].
>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5adf563d-117b-4bd8-a283-dd21e53575f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> You received this message because you are subscribed to 
>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>>> from it, send an email to [email protected]
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/853c21b6-9b58-42ea-929e-f9b932098bbdn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>>
>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com
>>>>>>>>>>>>>  
>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79bf5824-5f74-4dc9-b2da-269840d1dc7fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>
>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com
>>>>>>>>>>>  
>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6492b2e2-060c-41a5-97bd-dfc238656cb4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>>
>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a274f441-5986-415c-a0a0-e05de6a3e790n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>
>>>>>>>>
>>>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4d0f22e4-cc3f-4487-a024-363e79ad8598n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>
>>>>>
>>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3fbe32ef-5477-42c4-911b-b980b24cea9cn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/595017f3-630a-4707-b4b3-a5aeed9e7a53n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/89197941-16d3-4747-b280-95ddb9979b40n%40googlegroups.com.

Reply via email to