Hi,

I am also trying to train Tesseract 4.0 for hand written digits, I want to 
know what is the *best way to create  pairs of [*.tif, *.gt.txt] with 
binarized chars and TTF's from two fonts (1869 text lines in total) . Are 
you using any specific tool to generate *.tif and *.gt.txt files. *
*I have data like below tiff image, Please guide me *
*Thank you*

<https://lh3.googleusercontent.com/-wdzw32GT4fk/W04iwd71ldI/AAAAAAAAJFA/lx3BfSnCujkKmch4oGRSJLFgkKG1uvuTgCLcBGAs/s1600/SCAN_20180716_145539118.tiff>


On Wednesday, July 4, 2018 at 8:20:54 PM UTC+5:30, Joe wrote:
>
> Hi everybody!
>
> I'm trying this tool https://github.com/OCR-D/ocrd-train/ but without 
> success so far. Tesseract and Leptonica are installed by the scripts.
> Inspired by the test set provided in that repo, I created pairs of [*.tif, 
> *.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in 
> total).
> You can see an example of my set in attachment that also contains files 
> created by the training process.
>
> My guess is that something is wrong with my data.
> Sometimes I can see the char train value increasing instead of decreasing 
> and the final error rate still too high (about 60%).
>
> That new training process with LSTM is driving me crazy!
> I would appreciate if anyone with experience could take a look to my data 
> set.
>
>
> Joe.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f8e993f4-39a2-4055-b8cb-b1059b3aa580%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to