[tesseract-ocr] What are Langdata repository given for retraining Tesseract

2021-04-14 Thread Venkatapathy S
Hi, I want to retrain Tesseract from the scratch for a particular language(I have read as many resources as possible, including warnings, from the Tutorial , Github and

Re: [tesseract-ocr] What do iteration numbers mean in the train logging?

2021-04-14 Thread Shree Devi Kumar
https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#iterations-and-checkpoints Epoch size depends on your training data. If you have 1000 lines of training data, then 1 epoch is 1000 iterations. If you have 5 lines of training text, 1 epoch is 5 iterations. On Wed,

Re: [tesseract-ocr] How to reduce the size of a OCRed pdf file using Tesseract OCR APIs.

2021-04-14 Thread Zdenko Podobny
Tesseract is an OCR engine and it does not change input image. For recompressing pdf you need other tools e.g. jbig2enc [1] , mupdf [2]... [1] https://github.com/agl/jbig2enc [2] https://mupdf.com/docs/manual-mutool-convert.html Zdenko st 14. 4. 2021 o 15:26 Sharp Subbu napísal(a): > Dear

Re: [tesseract-ocr] What do iteration numbers mean in the train logging?

2021-04-14 Thread akmalkady
I am looking for the same answer. What are learning iteration, training iteration, and sample iteration? On Tuesday, January 1, 2019 at 6:42:16 AM UTC-5 bohdan.mo...@gmail.com wrote: > Ok, it says it’s learning iteration, training iteration and sample > iteration respectively. But what do

Re: [tesseract-ocr] What is Max Iterations & Epochs in tesstrain Makefile

2021-04-14 Thread Shree Devi Kumar
See https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#lstmtraining-command-line Epoch has been recently added to the tesstrain makefile and converts to number of iterations based on amount of training data. On Wed, Apr 14, 2021, 01:36 GCP COGNEXT wrote: > Hi All, > > I

Re: [tesseract-ocr] Unable to understand Iterations?

2021-04-14 Thread Shree Devi Kumar
It has seen only 600 lines of data of which only 300 have been used for learning. Iterations are different from an epoch which is going through all training data. On Wed, Apr 14, 2021, 01:36 GCP COGNEXT wrote: > What does *At Iteration 300/600/600.* > > Let's assume I have 10k data and I

Re: [tesseract-ocr] How to reduce the size of a OCRed pdf file using Tesseract OCR APIs.

2021-04-14 Thread Sharp Subbu
On Wed, Apr 14, 2021, 5:57 PM Sharp Subbu wrote: > Dear friends, > > Kindly guide/help us to find solution for the below point: > = > How to reduce the size of a OCRed pdf file using Tesseract OCR APIs. > === > > -- > You received this

Re: [tesseract-ocr] How to reduce the size of a OCRed pdf file using Tesseract OCR APIs.

2021-04-14 Thread Merlijn B.W. Wajer
Hi, On 14/04/2021 13:52, Sharp Subbu wrote: > Dear friends, > > Kindly guide/help us to find solution for the below point: > = > How to reduce the size of a OCRed pdf file using Tesseract OCR APIs. > === Not sure exactly what use case you

[tesseract-ocr] How to reduce the size of a OCRed pdf file using Tesseract OCR APIs.

2021-04-14 Thread Sharp Subbu
Dear friends, Kindly guide/help us to find solution for the below point: = How to reduce the size of a OCRed pdf file using Tesseract OCR APIs. === -- You received this message because you are subscribed to the Google Groups