Yeah I gave it quite a while to complete and it was still stuck on the same 
text2image call. Upon inspection, I see that its hanging after the eighth 
call to text2image during Phase I when the synthetic images are being 
generated. I'm getting the same behavior using the unmodified tesstrain 
scripts as well. Do you know if there's an easy way to force tesstrain.sh 
to process files sequentially?

I'll be sure to check out ocr-d/train. 

On Friday, January 4, 2019 at 12:51:57 PM UTC-5, shree wrote:
>
> You can also try the ocr-d/train project which can train using scanned 
> images.
>
> On Fri, 4 Jan 2019, 12:44 Shree Devi Kumar <shree...@gmail.com 
> <javascript:> wrote:
>
>> tesestrain.sh is setup to process files in batches of 8 simultaneously. 
>> Are you allowing the script to run to completion?
>>
>> On Fri, 4 Jan 2019, 11:27 <tc...@zips.uakron.edu <javascript:> wrote:
>>
>>> Hey all,
>>>
>>> I'm currently working on a program that explores the handwritten OCR 
>>> capabilities of Tesseract.
>>>
>>> I have ~1400 images with ~8 lines of handwritten textlines per image 
>>> with accompanying BOX files. Additionally, I've got a couple of handwritten 
>>> fonts that I'm using to bootstrap the training process.
>>>
>>> One problem I'm having is that when I invoke tesstrain.sh, it will 
>>> consitently fail at some point (mostly around Phase E) when more than 7 
>>> box/tif pairs or fonts are provided as input. I've tried combinations where 
>>> all the inputs are font files, all inputs are handwritten tif/box pairs, 
>>> and inputs as a mix of the two.
>>>
>>> I had originally tried using Shree's modified boxtrain files but was 
>>> receiving an error that had to do with failing to read in a unicharset 
>>> file. So, I modified tesstrain.sh and tesstrain_utils.sh (referencing 
>>> Shree's modified scripts) myself to work with my own provided tif/box pairs.
>>>
>>> Is there a limit to the number of inputs to tesstrain.sh that should be 
>>> followed or should I confidently be able to give tesstrain.sh all 1400 of 
>>> my images no problem?
>>>
>>> Thanks,
>>> Tim Snyder
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/dba86440-e325-4156-bfc7-85a1a680c63e%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/dba86440-e325-4156-bfc7-85a1a680c63e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6051fc83-5f65-4339-8cd7-3f0eee52c8b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to