may be useful for your investigation. On Wed, Dec 9, 2015 at 4:00 PM, Sriranga(83yrsold) < [email protected]> wrote:
> Tom, > thanks for the hints. Just now I tested the eng.unicharambigs created by > me and found workable. - attached files will speak itself. I am happy to > note that eng.unicharambigs works fine. also attached output > "unicharamtest.txt" for perusal - in which however I noticed that last line > "luck good" did not changed to "good luck" - where I made mistake? your > suggested sentence > "Novv is the time to go dovvn" also corrected. Please note I regenerated > eng.traineddata in ubuntu 15.10. > With regards, sriranga(83ys) > > On Wed, Dec 9, 2015 at 12:04 AM, Tom Morris <[email protected]> wrote: > >> FreeOCR is closed source and Windows only, so it's difficult for me to >> tell what it's doing (or even what version of Tesseract it includes). >> However, the test case that you're using doesn't appear realistic. >> Tesseract is optimized for recognizing words, not short random strings of >> characters, so rather than testing on "vv w" I think you'd get more >> representative results if you used something like "Novv is the time to go >> dovvn" and see if it turns the vv's into w's. Having said that, vv ==> w >> isn't an entry in the standard eng.unicharambigs. They only mandatory >> entries are for quotes, so you could try things like `' or '` to see if >> they get turned into ". >> >> As far as I know, there's no way to specify a different unicharambigs >> file on the command line. You need to replace it in the kan.traineddata >> file for it to be found. The combine_tessdata utility is used for packing >> and unpacked the traineddata files. e.g. >> >> $ combine_tessdata -e kan.traineddata kan.unicharambigs >> $ combine_tessdata -o kan.traineddata kan.unicharambigs >> >> One thing that I noticed when looking at the source is that there's an >> upper limit of 10 characters for the bad and replacement strings, which I'm >> not sure is documented anywhere. This should be plenty for most >> applications, but it's something to keep in mind. >> >> Good luck. Let us know how you make out. >> >> Tom >> >> >> >> On Tue, Dec 8, 2015 at 4:11 AM, Sriranga(83yrsold) < >> [email protected]> wrote: >> >>> Another question Is how to test and add more in the <lang>unicharambigs >>> in the tesseract-ocr . In case if it can be tested in the CMD or terminal >>> what is the commandline to be used? >>> >>> On Tue, Dec 8, 2015 at 2:18 PM, Sriranga(83yrsold) < >>> [email protected]> wrote: >>> >>>> Hi Tom, >>>> attached herewith sample of post-proc.txt used in FreeOCR - which had >>>> incorporated on my special request by creator Ralph Richardson more than 3 >>>> years back. Attached screenshots will speak itself. As a sample I have done >>>> in English for easy understand by you. >>>> You can test in any langs. FreeOCR available for free download. >>>> you will notice that post-processor text sample (except no option like >>>> 0 or 1)has similar feature available in the <lang>unicharambig. >>>> *Advantage of in-built *of "unicharambigs" is at the time of final >>>> output of OCRed- >>>> all misspelling will automatically corrected before generating the >>>> <lan>traineddata resulting the corrected tessdata can be used for any image >>>> for correcting output text. >>>> *disadvantage of post processor* being external program is - one >>>> should have update the post-proc.text everytime for each ocred >>>> I am puzzled why unicharmabigs does not work as internal program >>>> correctly - when the post processor program works fine? >>>> With regards, >>>> sriranga(83yrs) >>>> >>>> >>>> On Mon, Dec 7, 2015 at 11:44 PM, Tom Morris <[email protected]> wrote: >>>> >>>>> Hi Sriranga. I haven't used the training tools, but since no one else >>>>> has answered, I'll give it my best attempt. Shree might have better >>>>> insights. >>>>> >>>>> First, a question of clarification. Are you having problems with the >>>>> file or are you just trying to determine whether it is working properly or >>>>> not? >>>>> >>>>> If you just want to see if it's working correctly, my impression is >>>>> that most people do this empirically by a) visual inspection of the file >>>>> to >>>>> see if the substitutions look correct and b) running a corpus of text >>>>> through to see how the contents of the file affect accuracy. >>>>> >>>>> To my untrained eye, the things I wonder about are: >>>>> - are all those mandatory substitutions (lines ending in 1) correct? >>>>> ie is it true that the string in column 1 can *never* be a valid word? >>>>> - there is an empty line which probably should be removed >>>>> - there are a few lines which have junk after the third column which >>>>> don't match the specified format e.g.: >>>>> >>>>> ಚಟಿಲ್ಕೆ ಚಟ್ನಿ,, 1 " >>>>> ಹೊರಿದಿವೆ ಹೊಂದಿವೆ.1 . >>>>> >>>>> Some of the words with embedded punctuation also look a little >>>>> suspicious to me. Not knowing the script or language I don't know how >>>>> common these errors are, but I'd probably start with a very basic list of >>>>> substitutions and add to it as I found more common errors. >>>>> >>>>> Hopefully someone else can give you better advice which is based on >>>>> more than bystander guesswork! >>>>> >>>>> Tom >>>>> >>>>> >>>>> On Friday, December 4, 2015 at 10:36:13 PM UTC-5, sriranga(83yrsold) >>>>> wrote: >>>>>> >>>>>> Solution is requested urgently. >>>>>> >>>>>> On Wed, Dec 2, 2015 at 4:25 PM, sriranga(83yrsold) < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >>>>>>> I have created kan.unicharambigs(attached below) based on the >>>>>>> output text of Kan.training_text file (which is big). I could not >>>>>>> understand how to test the attached file and find out whether it works >>>>>>> or >>>>>>> not? >>>>>>> kindly point out my mistakes in fhe said attached file, if any, for >>>>>>> which i shall be thankful to you. I prefer to have commandline test if >>>>>>> possible. >>>>>>> >>>>>>> >>>>>>> ========================================================================== >>>>>>> Based on wiki instruction (extract reproduced below for ready >>>>>>> reference) = >>>>>>> >>>>>>> The rules are not bidirectional, so if you want 'rn' to be >>>>>>> considered when 'm' is detected and vise versa you need a rule for each. >>>>>>> >>>>>>> Version 3.03 and on supports a new, simpler format for the >>>>>>> unicharambigs file: >>>>>>> >>>>>>> v2 >>>>>>> '' " 1 >>>>>>> m rn 0 >>>>>>> iii m 0 >>>>>>> >>>>>>> In this format, the "error" and "correction" are simple utf-8 >>>>>>> strings separated by *a space*, and, after another space, the same >>>>>>> type specifier as v1 (0 for optional and 1 for mandatory substitution). >>>>>>> Note the downside of this simpler format is that Tesseract has to encode >>>>>>> the utf-8 strings into the components of the unicharset. In complex >>>>>>> scripts, this encoding may be ambiguous. In this case, the encoding is >>>>>>> chosen such as to use the least utf-8 characters for each component, ie >>>>>>> the >>>>>>> shortest unicharset components will make up the encoding. >>>>>>> >>>>>>> Like most other files used in training, the 'unicharambigs' file >>>>>>> must be encoded as UTF8, and must end with a newline character. The >>>>>>> unicharambigs format is also described in the unicharambigs(5) man >>>>>>> page >>>>>>> <https://tesseract-ocr.googlecode.com/svn-history/r683/trunk/doc/unicharambigs.5.html>. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0d30025d-cc11-4f69-9e98-ec919d3f43df%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/cb707912-5c46-46c8-8791-340f84e6421a%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/cb707912-5c46-46c8-8791-340f84e6421a%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "tesseract-ocr" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/tesseract-ocr/VXdCSnno06w/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YxsYjJuvCpc0rPY56ZB2bWo_XFDAY_rzP13k4rD20ZbdA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YxsYjJuvCpc0rPY56ZB2bWo_XFDAY_rzP13k4rD20ZbdA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAE9vqEH3Qhs1QK3yoAmqR%3Dw-%2B9Bd_BNYgpoNxf%2BCaFNaE1k2zA%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAE9vqEH3Qhs1QK3yoAmqR%3Dw-%2B9Bd_BNYgpoNxf%2BCaFNaE1k2zA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YzsGggi7DHHygdiKcUck_Z0%3DBmCsC0pPZ6L0SJaQ-uo4g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
[Wed Dec 9 16:03:47 IST 2015] /usr/bin//text2image --fonts_dir=/usr/share/fonts/ --font=Arial --outputbase=/tmp/font_tmp.IBGY0ijB6l/sample_text.txt --text=/tmp/font_tmp.IBGY0ijB6l/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l Rendered page 0 to file /tmp/font_tmp.IBGY0ijB6l/sample_text.txt.tif Rtl = 0 ,vertical=0 === Phase I: Generating training images === Rendering using Arial Italic Rendering using Arial Rendering using Courier New Bold Rendering using Courier New Bold Italic Rendering using Courier New Rendering using Courier New Italic Rendering using Arial Bold Rendering using Arial Bold Italic [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New_Italic.exp0 --font=Courier New Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0 --font=Courier New Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial.exp0 --font=Arial --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial_Bold.exp0 --font=Arial Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New.exp0 --font=Courier New --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0 --font=Arial Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial_Italic.exp0 --font=Arial Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:03:52 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New_Bold.exp0 --font=Courier New Bold --text=../langdata//eng/eng.training_text Rendered page 0 to file /tmp/tesstrain/eng/eng.Courier_New.exp0.tif Rtl = 0 ,vertical=0 Rendered page 0 to file /tmp/tesstrain/eng/eng.Arial.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Courier New Rendered page 0 to file /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Arial [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New.exp0 --font=Courier New --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of Arial Bold Italic [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial.exp0 --font=Arial --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0 --font=Arial Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.tif Rtl = 0 ,vertical=0 Rendered page 0 to file /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.Arial_Bold.exp0.tif Extracting font properties of Courier New Bold Rtl = 0 ,vertical=0 [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New_Bold.exp0 --font=Courier New Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.Arial_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Courier New Italic Rendered page 0 to file /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Arial Bold Extracting font properties only [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New_Italic.exp0 --font=Courier New Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial_Bold.exp0 --font=Arial Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of Courier New Bold Italic Extracting font properties of Arial Italic [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0 --font=Courier New Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only [Wed Dec 9 16:03:59 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Arial_Italic.exp0 --font=Arial Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Extracting font properties only Extracting font properties only Extracting font properties only Done! Done! Done! Done! Done! Done! Done! Done! Rendering using Times New Roman, Bold Rendering using Times New Roman, Bold Italic Rendering using Times New Roman, Rendering using Georgia Bold Rendering using Georgia Rendering using Times New Roman, Italic Rendering using Georgia Bold Italic Rendering using Georgia Italic [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0 --font=Times New Roman, Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0 --font=Times New Roman, Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia.exp0 --font=Georgia --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0 --font=Georgia Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia_Bold.exp0 --font=Georgia Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman.exp0 --font=Times New Roman, --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia_Italic.exp0 --font=Georgia Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:00 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0 --font=Times New Roman, Italic --text=../langdata//eng/eng.training_text Rendered page 0 to file /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Times New Roman, Bold Italic [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0 --font=Times New Roman, Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.tif Rtl = 0 ,vertical=0 Rendered page 0 to file /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Georgia Italic Rendered page 0 to file /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia_Italic.exp0 --font=Georgia Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of Times New Roman, Bold Extracting font properties of Times New Roman, Italic Rendered page 0 to file /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0 --font=Times New Roman, Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Extracting font properties only [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0 --font=Times New Roman, Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of Georgia Bold Italic Rendered page 0 to file /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0 --font=Georgia Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties only Extracting font properties of Georgia Bold Extracting font properties of Times New Roman, Rendered page 0 to file /tmp/tesstrain/eng/eng.Georgia.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia_Bold.exp0 --font=Georgia Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Times_New_Roman.exp0 --font=Times New Roman, --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of Georgia [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Georgia.exp0 --font=Georgia --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Extracting font properties only Extracting font properties only Done! Done! Done! Done! Done! Done! Done! Done! Rendering using Trebuchet MS Bold Italic Rendering using Trebuchet MS Bold Rendering using Trebuchet MS Italic Rendering using Verdana Bold Rendering using Trebuchet MS Rendering using Verdana Italic Rendering using Verdana Rendering using Verdana Bold Italic [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0 --font=Trebuchet MS Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0 --font=Trebuchet MS Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana.exp0 --font=Verdana --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS.exp0 --font=Trebuchet MS --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana_Bold.exp0 --font=Verdana Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0 --font=Trebuchet MS Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana_Italic.exp0 --font=Verdana Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:07 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0 --font=Verdana Bold Italic --text=../langdata//eng/eng.training_text Rendered page 0 to file /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Trebuchet MS Bold Italic [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0 --font=Trebuchet MS Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Verdana Italic [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana_Italic.exp0 --font=Verdana Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 Rendered page 0 to file /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Verdana Bold Italic Extracting font properties of Trebuchet MS Bold Extracting font properties of Trebuchet MS Italic [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0 --font=Trebuchet MS Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0 --font=Verdana Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0 --font=Trebuchet MS Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.Verdana.exp0.tif Rtl = 0 ,vertical=0 Rendered page 0 to file /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties only Extracting font properties only Extracting font properties of Trebuchet MS Extracting font properties only Extracting font properties of Verdana [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Trebuchet_MS.exp0 --font=Trebuchet MS --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 [Wed Dec 9 16:04:14 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana.exp0 --font=Verdana --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Verdana Bold [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Verdana_Bold.exp0 --font=Verdana Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 WARNING: Adjusting to bad page break after '$5' Extracting font properties only Done! Done! Done! Done! Done! Done! Done! WARNING: Adjusting to bad page break after '$5' Done! Rendering using URW Bookman L Bold Italic Rendering using Century Schoolbook L Medium Rendering using Century Schoolbook L Bold Italic Rendering using DejaVu Sans Ultra-Light Rendering using Century Schoolbook L Bold Rendering using URW Bookman L Bold Rendering using URW Bookman L Italic Rendering using Century Schoolbook L Italic [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0 --font=Century Schoolbook L Medium --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0 --font=URW Bookman L Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0 --font=URW Bookman L Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0 --font=DejaVu Sans Ultra-Light --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0 --font=URW Bookman L Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0 --font=Century Schoolbook L Bold --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0 --font=Century Schoolbook L Bold Italic --text=../langdata//eng/eng.training_text [Wed Dec 9 16:04:15 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0 --font=Century Schoolbook L Italic --text=../langdata//eng/eng.training_text Rendered page 0 to file /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Century Schoolbook L Medium [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0 --font=Century Schoolbook L Medium --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of URW Bookman L Bold Italic [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0 --font=URW Bookman L Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.tif Extracting font properties only Rtl = 0 ,vertical=0 Extracting font properties of DejaVu Sans Ultra-Light Rendered page 0 to file /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0 --font=DejaVu Sans Ultra-Light --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.tif Rtl = 0 ,vertical=0 Extracting font properties of Century Schoolbook L Italic Rendered page 0 to file /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0 --font=Century Schoolbook L Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Rendered page 0 to file /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.tif Extracting font properties of URW Bookman L Bold Extracting font properties of Century Schoolbook L Bold Rtl = 0 ,vertical=0 Extracting font properties only Rendered page 0 to file /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.tif Rtl = 0 ,vertical=0 [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0 --font=URW Bookman L Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of URW Bookman L Italic [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0 --font=Century Schoolbook L Bold --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties of Century Schoolbook L Bold Italic Extracting font properties only [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0 --font=URW Bookman L Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only [Wed Dec 9 16:04:22 IST 2015] /usr/bin//text2image --fontconfig_tmpdir=/tmp/font_tmp.IBGY0ijB6l --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0 --font=Century Schoolbook L Bold Italic --ligatures=false --text=../langdata//eng/eng.training_text.train_ngrams --only_extract_font_properties --ptsize=32 Extracting font properties only Extracting font properties only Extracting font properties only Done! Done! Done! Done! Done! Done! Done! Done! === Phase UP: Generating unicharset and unichar properties files === [Wed Dec 9 16:04:23 IST 2015] /usr/bin//unicharset_extractor -D /tmp/tesstrain/eng/ /tmp/tesstrain/eng/eng.Arial_Bold.exp0.box /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Arial.exp0.box /tmp/tesstrain/eng/eng.Arial_Italic.exp0.box /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.box /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.box /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.box /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.box /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Courier_New.exp0.box /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.box /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.box /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.box /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Georgia.exp0.box /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.box /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.box /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.box /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.box /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.box /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.box /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.box /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.box /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.box /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.box /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.box /tmp/tesstrain/eng/eng.Verdana.exp0.box /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Arial_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Arial.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Arial_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Courier_New.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Georgia.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Verdana.exp0.box Extracting unicharset from /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.box Wrote unicharset file /tmp/tesstrain/eng//unicharset. [Wed Dec 9 16:04:23 IST 2015] /usr/bin//set_unicharset_properties -U /tmp/tesstrain/eng/eng.unicharset -O /tmp/tesstrain/eng/eng.unicharset -X /tmp/tesstrain/eng/eng.xheights --script_dir=../langdata/ Loaded unicharset of size 32 from file /tmp/tesstrain/eng/eng.unicharset Setting unichar properties Other case I of i is not in unicharset Other case M of m is not in unicharset Other case X of x is not in unicharset Other case E of e is not in unicharset Other case U of u is not in unicharset Other case C of c is not in unicharset Other case P of p is not in unicharset Other case R of r is not in unicharset Other case H of h is not in unicharset Other case G of g is not in unicharset Other case D of d is not in unicharset Other case S of s is not in unicharset Other case K of k is not in unicharset Writing unicharset to file /tmp/tesstrain/eng/eng.unicharset === Phase D: Generating Dawg files === Generating word Dawg [Wed Dec 9 16:04:23 IST 2015] /usr/bin//wordlist2dawg -r 1 ../langdata//eng/eng.wordlist /tmp/tesstrain/eng/eng.word-dawg /tmp/tesstrain/eng/eng.unicharset Set reverse_policy to RRP_REVERSE_IF_HAS_RTL Loading unicharset from '/tmp/tesstrain/eng/eng.unicharset' Reading word list from '../langdata//eng/eng.wordlist' Reducing Trie to SquishedDawg Writing squished DAWG to '/tmp/tesstrain/eng/eng.word-dawg' Generating frequent-word Dawg [Wed Dec 9 16:04:24 IST 2015] /usr/bin//wordlist2dawg -r 1 /tmp/tesstrain/eng/eng.wordlist.clean.freq /tmp/tesstrain/eng/eng.freq-dawg /tmp/tesstrain/eng/eng.unicharset Set reverse_policy to RRP_REVERSE_IF_HAS_RTL Loading unicharset from '/tmp/tesstrain/eng/eng.unicharset' Reading word list from '/tmp/tesstrain/eng/eng.wordlist.clean.freq' Reducing Trie to SquishedDawg Writing squished DAWG to '/tmp/tesstrain/eng/eng.freq-dawg' [Wed Dec 9 16:04:24 IST 2015] /usr/bin//wordlist2dawg -r 0 ../langdata//eng/eng.punc /tmp/tesstrain/eng/eng.punc-dawg /tmp/tesstrain/eng/eng.unicharset Set reverse_policy to RRP_DO_NO_REVERSE Loading unicharset from '/tmp/tesstrain/eng/eng.unicharset' Reading word list from '../langdata//eng/eng.punc' Reducing Trie to SquishedDawg Writing squished DAWG to '/tmp/tesstrain/eng/eng.punc-dawg' [Wed Dec 9 16:04:24 IST 2015] /usr/bin//wordlist2dawg -r 0 ../langdata//eng/eng.numbers /tmp/tesstrain/eng/eng.number-dawg /tmp/tesstrain/eng/eng.unicharset Set reverse_policy to RRP_DO_NO_REVERSE Loading unicharset from '/tmp/tesstrain/eng/eng.unicharset' Reading word list from '../langdata//eng/eng.numbers' Reducing Trie to SquishedDawg Writing squished DAWG to '/tmp/tesstrain/eng/eng.number-dawg' [Wed Dec 9 16:04:24 IST 2015] /usr/bin//wordlist2dawg -r 1 ../langdata//eng/eng.word.bigrams /tmp/tesstrain/eng/eng.bigram-dawg /tmp/tesstrain/eng/eng.unicharset Set reverse_policy to RRP_REVERSE_IF_HAS_RTL Loading unicharset from '/tmp/tesstrain/eng/eng.unicharset' Reading word list from '../langdata//eng/eng.word.bigrams' Reducing Trie to SquishedDawg Writing squished DAWG to '/tmp/tesstrain/eng/eng.bigram-dawg' === Phase E: Extracting features === Using TESSDATA_PREFIX=/usr/local/share/tessdata/ [Wed Dec 9 16:04:31 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Arial_Bold.exp0.tif /tmp/tesstrain/eng/eng.Arial_Bold.exp0 box.train [Wed Dec 9 16:04:31 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:31 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.tif /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0 box.train [Wed Dec 9 16:04:31 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Arial.exp0.tif /tmp/tesstrain/eng/eng.Arial.exp0 box.train [Wed Dec 9 16:04:31 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.tif /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0 box.train [Wed Dec 9 16:04:32 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Arial_Italic.exp0.tif /tmp/tesstrain/eng/eng.Arial_Italic.exp0 box.train [Wed Dec 9 16:04:32 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.tif /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica [Wed Dec 9 16:04:32 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 43 words Generated training data for 44 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Generated training data for 42 words Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words Generated training data for 43 words Generated training data for 42 words [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Georgia.exp0.tif /tmp/tesstrain/eng/eng.Georgia.exp0 box.train [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.tif /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0 box.train [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.tif /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0 box.train [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.tif /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0 box.train [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Courier_New.exp0.tif /tmp/tesstrain/eng/eng.Courier_New.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica [Wed Dec 9 16:04:40 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.tif /tmp/tesstrain/eng/eng.Georgia_Bold.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 row xheight=20.9703, but median xheight = 24.4062 APPLY_BOXES: APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: APPLY_BOXES: Boxes read from boxfile: 150 Boxes read from boxfile: 150 Found 150 good blobs. Found 150 good blobs. row xheight=24.2451, but median xheight = 27.85 APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words Generated training data for 42 words Generated training data for 41 words Generated training data for 42 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words Generated training data for 43 words Generated training data for 42 words Generated training data for 43 words [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.tif /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0 box.train [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.tif /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0 box.train [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.tif /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0 box.train [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.tif /tmp/tesstrain/eng/eng.Georgia_Italic.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.tif /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0 box.train [Wed Dec 9 16:04:48 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.tif /tmp/tesstrain/eng/eng.Times_New_Roman.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words APPLY_BOXES: APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words Generated training data for 42 words APPLY_BOXES: APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 48 words Generated training data for 43 words Generated training data for 45 words Generated training data for 42 words Generated training data for 42 words [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.tif /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0 box.train [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.tif /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0 box.train [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.tif /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0 box.train [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.tif /tmp/tesstrain/eng/eng.Verdana_Bold.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.tif /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0 box.train [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Verdana.exp0.tif /tmp/tesstrain/eng/eng.Verdana.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica [Wed Dec 9 16:04:56 IST 2015] /usr/bin//tesseract /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.tif /tmp/tesstrain/eng/eng.Verdana_Italic.exp0 box.train Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 Page 1 APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 44 words Generated training data for 42 words FAIL! APPLY_BOXES: boxfile line 1/v ((113,4658),(140,4684)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 2/v ((140,4658),(167,4684)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 3/w ((183,4657),(226,4684)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 4/V ((113,4559),(151,4593)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 5/V ((146,4559),(184,4593)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 6/W ((197,4558),(248,4592)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 7/i ((114,4460),(131,4499)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 8/i ((133,4460),(150,4499)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 9/i ((152,4460),(169,4499)): FAILURE! Couldn't find a matching blob FAIL! APPLY_BOXES: boxfile line 10/m ((187,4459),(234,4486)): FAILURE! Couldn't find a matching blob APPLY_BOXES: Boxes read from boxfile: 150 Boxes failed resegmentation: 10 Found 140 good blobs. Generated training data for 43 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 38 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 42 words Generated training data for 42 words Generated training data for 44 words APPLY_BOXES: Boxes read from boxfile: 150 Found 150 good blobs. Generated training data for 44 words === Phase C: Clustering feature prototypes (cnTraining) === [Wed Dec 9 16:05:04 IST 2015] /usr/bin//cntraining -D /tmp/tesstrain/eng/ /tmp/tesstrain/eng/eng.Arial_Bold.exp0.tr /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Arial.exp0.tr /tmp/tesstrain/eng/eng.Arial_Italic.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.tr /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.tr /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Courier_New.exp0.tr /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.tr /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.tr /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.tr /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Georgia.exp0.tr /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.tr /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.tr /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.tr /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.tr /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Verdana.exp0.tr /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.tr Reading /tmp/tesstrain/eng/eng.Arial_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Arial.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Arial_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Courier_New.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Georgia.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Verdana.exp0.tr ... Reading /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.tr ... Clustering ... Writing /tmp/tesstrain/eng//normproto ... === Phase M : Clustering microfeatures (mfTraining) === [Wed Dec 9 16:05:05 IST 2015] /usr/bin//mftraining -D /tmp/tesstrain/eng/ -U /tmp/tesstrain/eng/eng.unicharset -O /tmp/tesstrain/eng/eng.mfunicharset -F ../langdata//font_properties -X /tmp/tesstrain/eng/eng.xheights /tmp/tesstrain/eng/eng.Arial_Bold.exp0.tr /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Arial.exp0.tr /tmp/tesstrain/eng/eng.Arial_Italic.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.tr /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.tr /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.tr /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Courier_New.exp0.tr /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.tr /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.tr /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.tr /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Georgia.exp0.tr /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.tr /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.tr /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.tr /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.tr /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.tr /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.tr /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.tr /tmp/tesstrain/eng/eng.Verdana.exp0.tr /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.tr Warning: No shape table file present: /tmp/tesstrain/eng//shapetable fontinfo table is of size 6164 Reading x-heights from /tmp/tesstrain/eng/eng.xheights ... Reading /tmp/tesstrain/eng/eng.Arial_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Arial_Bold.exp0.fontinfo for font 277... Reading /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Arial_Bold_Italic.exp0.fontinfo for font 278... Reading /tmp/tesstrain/eng/eng.Arial.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Arial.exp0.fontinfo for font 275... Reading /tmp/tesstrain/eng/eng.Arial_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Arial_Italic.exp0.fontinfo for font 287... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold.exp0.fontinfo for font 1173... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Bold_Italic.exp0.fontinfo for font 1174... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Italic.exp0.fontinfo for font 1175... Reading /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Century_Schoolbook_L_Medium.exp0.fontinfo for font 1176... Reading /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Courier_New_Bold.exp0.fontinfo for font 1478... Reading /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Courier_New_Bold_Italic.exp0.fontinfo for font 1479... Reading /tmp/tesstrain/eng/eng.Courier_New.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Courier_New.exp0.fontinfo for font 1477... Reading /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Courier_New_Italic.exp0.fontinfo for font 1488... Reading /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.DejaVu_Sans_Ultra-Light.exp0.fontinfo for font 1588... Reading /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Georgia_Bold.exp0.fontinfo for font 2327... Reading /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Georgia_Bold_Italic.exp0.fontinfo for font 2328... Reading /tmp/tesstrain/eng/eng.Georgia.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Georgia.exp0.fontinfo for font 2326... Reading /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Georgia_Italic.exp0.fontinfo for font 2329... Reading /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Times_New_Roman_Bold.exp0.fontinfo for font 5642... Reading /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Times_New_Roman_Bold_Italic.exp0.fontinfo for font 5643... Reading /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Times_New_Roman.exp0.fontinfo for font 5641... Reading /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Times_New_Roman_Italic.exp0.fontinfo for font 5652... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold.exp0.fontinfo for font 5748... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Trebuchet_MS_Bold_Italic.exp0.fontinfo for font 5749... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Trebuchet_MS.exp0.fontinfo for font 5747... Reading /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Trebuchet_MS_Italic.exp0.fontinfo for font 5750... Reading /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold.exp0.fontinfo for font 5885... Reading /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.URW_Bookman_L_Bold_Italic.exp0.fontinfo for font 5886... Reading /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.URW_Bookman_L_Italic.exp0.fontinfo for font 5887... Reading /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Verdana_Bold.exp0.fontinfo for font 5944... Reading /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Verdana_Bold_Italic.exp0.fontinfo for font 5945... Reading /tmp/tesstrain/eng/eng.Verdana.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Verdana.exp0.fontinfo for font 5943... Reading /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.tr ... Reading spacing from /tmp/tesstrain/eng/eng.Verdana_Italic.exp0.fontinfo for font 5946... Flat shape table summary: Number of shapes = 926 max unichars = 1 number with multiple unichars = 0 Warning: no protos/configs for Joined in CreateIntTemplates() Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates() Done! === Phase B : ambiguities training === Found file ../langdata//eng/eng.unicharambigs === Making final traineddata file === Copying ../langdata//eng/eng.cube-unicharset to /tmp/tesstrain/eng Copying ../langdata//eng/eng.cube-word-dawg to /tmp/tesstrain/eng [Wed Dec 9 16:05:11 IST 2015] /usr/bin//combine_tessdata /tmp/tesstrain/eng/eng. TessdataManager combined tesseract data files. Offset for type 0 (/tmp/tesstrain/eng/eng.config ) is -1 Offset for type 1 (/tmp/tesstrain/eng/eng.unicharset ) is 140 Offset for type 2 (/tmp/tesstrain/eng/eng.unicharambigs ) is 2199 Offset for type 3 (/tmp/tesstrain/eng/eng.inttemp ) is 2476 Offset for type 4 (/tmp/tesstrain/eng/eng.pffmtable ) is 471286 Offset for type 5 (/tmp/tesstrain/eng/eng.normproto ) is 471533 Offset for type 6 (/tmp/tesstrain/eng/eng.punc-dawg ) is 475195 Offset for type 7 (/tmp/tesstrain/eng/eng.word-dawg ) is 475213 Offset for type 8 (/tmp/tesstrain/eng/eng.number-dawg ) is 1197367 Offset for type 9 (/tmp/tesstrain/eng/eng.freq-dawg ) is 1198025 Offset for type 10 (/tmp/tesstrain/eng/eng.fixed-length-dawgs ) is -1 Offset for type 11 (/tmp/tesstrain/eng/eng.cube-unicharset ) is 1198779 Offset for type 12 (/tmp/tesstrain/eng/eng.cube-word-dawg ) is 1200290 Offset for type 13 (/tmp/tesstrain/eng/eng.shapetable ) is 2262396 Offset for type 14 (/tmp/tesstrain/eng/eng.bigram-dawg ) is 2279068 Offset for type 15 (/tmp/tesstrain/eng/eng.unambig-dawg ) is -1 Offset for type 16 (/tmp/tesstrain/eng/eng.params-model ) is -1 Combining tessdata files Output /tmp/tesstrain/eng/eng.traineddata created sucessfully. Moving /tmp/tesstrain/eng/eng.traineddata to /tmp/tesstrain/tessdata Completed training for language 'eng'

