Thanks, Quan - excuse the brevity, sent from mobile
On 08-Feb-2017 7:33 PM, "Quan Nguyen" <[email protected]> wrote: > > > On Tuesday, February 7, 2017 at 9:34:11 AM UTC-6, shree wrote: >> >> For LSTM training, box files need to have an additional line for each >> text line with the tab character to indicate a new line. >> >> If you have existing box/tiff pairs, you can use a box editor (such as >> jtessboxeditor) and insert a box at end of each line and add a tab >> character in it. >> > > The jTessBoxEditor beta version has a new Mark EOL function that does just > that. > > >> >> >On the toolbar, the Character textbox has a built-in conversion >> function. If you enter U+0009 and hit Enter key or click on the adjacent >> Tool icon, the escape sequences will be converted to Unicode. You can also >> enter the tab character via Alt+09 numpad keys on Windows. >> >> o >> r add a dummy sequence such as @@@ and then replace to tab character in >> a text editor. >> >> See attached files as a sample. >> >> Then modify tesstrain.sh to copy the box tiff pairs to the training >> directory before starting training >> >> >> >> mkdir -p ${TRAINING_DIR} >> tlog "\n=== Starting training for language '${LANG_CODE}'" >> >> cp ./*.box "${TRAINING_DIR}/" >> cp ./*.tif "${TRAINING_DIR}/" >> >> >> On Tue, Feb 7, 2017 at 8:27 PM, Kay-Michael Würzner <[email protected]> >> wrote: >> >>> +1 for this question. The training documentation for Tesseract 4.0 by >>> now only covers training with font files (synthetic materials). What is >>> missing is information on training with real data (i.e. manually aligned >>> ground truth). >>> Any hints on that matter are greatly appreciated. >>> >>> Cheers, >>> Kay >>> >>> On Wednesday, January 18, 2017 at 12:31:54 AM UTC+1, [email protected] >>> wrote: >>>> >>>> I have a bunch of images, containing English words. >>>> I would like to generate training data by these images, and do the >>>> training. >>>> How should I do? >>>> >>>> Thanks a lot. >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/7bffab95-3e6b-4165-929e-a152f1799703%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ab8bc158-95b1-4c08-bc99-76a7442a919d%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVMFM6uHtRmbLAaH5ZOKuuNfyX%2BV-9Dsbti1ZihGhL%2BhA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

