[tesseract-ocr] Re: Tesseract 3.05 build on windows x86

2021-08-20 Thread Quan Nguyen
Tesseract Windows executable can be downloaded from:

https://digi.bib.uni-mannheim.de/tesseract/

On Thursday, July 22, 2021 at 3:38:26 AM UTC-5 luys...@gmail.com wrote:

> Hello Every Body.
> I'm trying to build tesseract 3.05 and leptonica 1.74 on x86 windows.
> But I can't get image library like libjpeg, libtiff, libpng and so on for 
> leptonica building.
> Who can help me to do this? Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e09e5698-f7cf-4e00-9ba8-ea5f0a295ca2n%40googlegroups.com.


[tesseract-ocr] Re: Creating training data for a language with a complex name, like ita_old or chi_sim_vert

2021-08-20 Thread Quan Nguyen
Pick a name that it accepts and then rename the output file to desirable 
names.

On Wednesday, August 18, 2021 at 5:40:05 AM UTC-5 smn...@gmail.com wrote:

> Hello,
>
> I try to create training data for a language with a complex name similar 
> to ita_old or chi_sim_vert. However when I run the command:
>
> tesstrain.sh --lang eng_old  --fonts_dir 
>
> I get this error:
>
> === Starting training for language 'eng_old'
> ERROR: Error: eng_old is not a valid language code
>
> How can I cause tesstrain.sh to accept 'eng_old' the way 'ita_old' is 
> accepted?
>
> Thank you in advance!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/03ee4814-44a9-49a5-917f-6ce88b0cbe09n%40googlegroups.com.


[tesseract-ocr] Re: shapeclustering error bad_alloc on Tesseract 3.05

2021-08-20 Thread Quan Nguyen
The command line imposes a limit on the command length. You can group 
images of same fonts in a multi-page TIFF to cut down the number of files 
and then conduct training on it.

On Monday, August 16, 2021 at 10:32:52 PM UTC-5 gan...@gmail.com wrote:

> Teaseract 3.05.  I have 2,000 * .TR files.  I ran the following command.  
> shapeclustering -f font_properties -u unicharaet lang.fontname0.exp0.tr 
> lang.fontname1.exp0.tr….  
> But when it goes to 200, it shows error std :: bad_alloc   
> I have 32GB RAM.  How to solve this error?
>
> Thanks in advance
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ec437ee3-7bed-43f1-96e1-b3af2153f79dn%40googlegroups.com.