Re: [tesseract-ocr] How to continue training with Makefile + EPOCHS

2023-12-05 Thread Keith Smith
>From one novice to another ... 1. Yes, that is my understanding of how to run further iterations. 2. Yes, EPOCHS says to iterate that many times over your set of tests. I think I have heard the recommended number of EPOCHS in general is 2, though I don't know how much science is behind that.

Re: [tesseract-ocr] Any success story?

2023-11-14 Thread Keith Smith
The short answer is "no", but a fuller answer is that my use case is a bit different from others and is as follows ... I trained tesseract to read the MICR line at the bottom of bank checks using only 20K checks (i.e. real data, not synthetic). I was able to get 85% accuracy where the reason for

Re: [tesseract-ocr] LSTM-based training produces .box files with the same coordinates

2023-11-01 Thread Keith Smith
fyi, I asked the same question in https://groups.google.com/g/tesseract-ocr/c/9myrnSD0HKM On Wednesday, November 1, 2023 at 7:21:37 AM UTC-4 zdenop wrote: > Are you following official tutorials? > Did you read the documentation? > Have you tried to check the official training repository and

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-23 Thread Keith Smith
sam. 21 oct. 2023 à 17:18, Keith Smith mailto:keithsmith...@gmail.com>> a écrit : Thank you Des for your help in this community. It is greatly appreciated! As one who is struggling, may I make a suggestion. I have started a google doc here<https://urldefense.com/v3/__https://docs.g

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-21 Thread Keith Smith
Thank you Des for your help in this community. It is greatly appreciated! As one who is struggling, may I make a suggestion. I have started a google doc here with a suggested format for a tutorial

[tesseract-ocr] Error using tesstrain with START_MODEL - failed to continue

2023-10-19 Thread Keith Smith
Hi, Could someone help me understand why I am getting the following error when using tesstrain with the START_MODEL option? Failed to continue from: data/micr_ref/micr.lstm >From my local tesstrain repo (cloned from https://github.com/tesseract-ocr/tesstrain), I have the following in

[tesseract-ocr] tesstrain help needed - failed to continue

2023-10-19 Thread Keith Smith
Hi, thanks in advance for your help. I am trying to use tesstrain to train tesseract to read the MICR line of checks, but am getting a "failed to continue" error as described below. Perhaps I am misunderstanding how to use tesstrain. Here is my data directory in my tesstrain directory: data

Re: [tesseract-ocr] How to generate training images with noise

2023-10-18 Thread Keith Smith
g at https://github.com/tesseract-ocr/tesstrain/blob/main/generate_line_box.py#L26 Shouldn't the box file coordinates be different for each character? Thanks, Keith On Fri, Oct 13, 2023 at 10:59 AM Keith Smith wrote: > Thanks Shree for the clarification. I'll give it a try. I was foll

Re: [tesseract-ocr] How to generate training images with noise

2023-10-13 Thread Keith Smith
eract-ocr/tesstrain/wiki > > It has details about training using the makefile. > > On Fri, Oct 13, 2023, 3:43 PM Keith Smith wrote: > >> Yes I have. I am asking about how to automate the generation of the >> ground truth images and box files, because from what I understand, >&

Re: [tesseract-ocr] How to generate training images with noise

2023-10-13 Thread Keith Smith
/tesstrain assumes the ground truth (images + box files) already exist. On Fri, Oct 13, 2023 at 1:00 AM Shree Devi Kumar wrote: > Have you looked at > > https://github.com/tesseract-ocr/tesstrain > > > > On Thu, Oct 12, 2023, 11:45 PM Keith Smith > wrote: > >>

[tesseract-ocr] How to generate training images with noise

2023-10-12 Thread Keith Smith
Hello, I am trying to use tesseract to OCR the MICR line of checks (i.e. the micr-e13b font). The training data that I found at https://github.com/BigPino67/Tesseract-MICR-OCR/blob/master/Tessdata/mcr.traineddata does not produce accurate results on my data set. I have a set of over 20K

[tesseract-ocr] OCR various fields of bank check in TIFF format

2023-08-08 Thread Keith Smith
Hello, I have several X9.37 files and would like to use tesseract to OCR the check images in TIFF format and compare the OCR results with those fields in the X9.37 file. If the results of my tesseract OCR do not match the values in the X9.37 file, then I'd like to flag the check for manual