Thanks a lot Shree. I tried the tesseract 4.0 and the training is working well until it reaches the lstm-training step and got stuck there. I am totally new in the training so hope you don't mind if I am asking silly questions. Do you know why I got stuck? Also, would you call this training fine-tuning? As I just want to improve the accuracy of existing eng.langdata.
<https://lh3.googleusercontent.com/-dWRkYql4AKA/W2k9PoNsndI/AAAAAAAAAOM/zWVkkPvUCT44moZPpvt6xgYFnQ0StwxUQCLcBGAs/s1600/Capture.PNG> On Monday, August 6, 2018 at 10:26:12 PM UTC-7, shree wrote: > > Ocr-d scripts are geared towards tesseract 4.0.x. you are trying to use it > with tesseract 3.05. > > On Tue 7 Aug, 2018, 10:50 AM May, <[email protected] <javascript:>> > wrote: > >> Hey Shree >> >> I also tried with the orignal script from the github. But faced the same >> issue with the process stuck at unicharset_output. >> >> >> <https://lh3.googleusercontent.com/-rFB69WQGLIg/W2krzHUjFfI/AAAAAAAAAOA/SZ4CEzUIEGMIhQUWXHfHMS9H4Yxk-ADGwCLcBGAs/s1600/Capture.PNG> >> >> >> These are the versions: >> tesseract 3.05.02 >> leptonica-1.75.3 >> libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : >> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0 >> >> >> On Thursday, August 2, 2018 at 8:52:38 PM UTC-7, shree wrote: >>> >>> Please use latest scripts from https://github.com/OCR-D/ocrd-train >>> >>> On Fri, Aug 3, 2018 at 4:41 AM May <[email protected]> wrote: >>> >>>> >>>> <https://lh3.googleusercontent.com/-LnwUni4-lLw/W2OPUqJpn_I/AAAAAAAAANs/Xd_-CVCdiMk0cjMmxBpVgfOSU1JeAacAgCLcBGAs/s1600/Capture.PNG> >>>> >>>> >>>> >>>> <https://lh3.googleusercontent.com/-j3_B1CmVv9w/W2OPbuUYH3I/AAAAAAAAANw/xmBXrNakKuMHm2L9cj-K3sCXCjFxuF80QCLcBGAs/s1600/Capture.PNG> >>>> >>>> >>>> >>>> Here are attached photos >>>> >>>> >>>> On Thursday, August 2, 2018 at 4:08:11 PM UTC-7, May wrote: >>>>> >>>>> Hey all, >>>>> >>>>> I am following Shree's script for OCR-d in the google groups for >>>>> ocrd-training ( >>>>> https://groups.google.com/forum/#!topic/tesseract-ocr/be4-rjvY2tQ). I >>>>> managed to pass the combine tessdata stage but got stuck at the >>>>> unicharset stage: >>>>> >>>>> >>>>> >>>>> I have edited the script to direct it to my path: >>>>> >>>>> I do find a unicharset file named "unicharset" but not as >>>>> "my.unicharset". Changing the script by removing "my." also did not solve >>>>> the problem. Do you know what's causing the issue? >>>>> >>>>> Best >>>>> May >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/48347dd8-7b7e-4d0d-9cb5-b21e3ec23f31%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/48347dd8-7b7e-4d0d-9cb5-b21e3ec23f31%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/af43b995-7e24-4dca-827c-080755211544%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/af43b995-7e24-4dca-827c-080755211544%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/29b12ff3-abac-4fe6-99af-7a8c443c5a99%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

