The manual "Training Tesseract 3" says: > Tesseract needs to know about different shapes of the same character by having different fonts separated explicitly. > This used to be limited to 32 fonts, but the limit has been raised to 64. > It is set by the constant MAX_NUM_CONFIGS defined in intproto.h. > Note that runtime is heavily dependent on the number of fonts provided, and training more than 32 will result in a significant slow-down.
I analyzed the number of fonts in eng.traineddata and I was very surprised that there have been 358 fonts trained ! get_fontinfo_table().size() returns 358 ! Can anybody explain me this contradiction ? Fonts in eng.traineddata: AR_PL_UKai_CN, AR_PL_UKai_Patched, AR_PL_UKai_TW, AR_PL_UMing_CN_Light, AR_PL_UMing_Patched_Light, AR_PL_UMing_TW_MBE_Light, Aboriginal_Sans, Aboriginal_Sans_Bold_Italic, Aboriginal_Sans_Italic, Aboriginal_Serif, Aboriginal_Serif_Bold, Aboriginal_Serif_Bold_Italic, Aboriginal_Serif_Italic, Abyssinica_SIL, AlArabiya, AlBattar, AlHor, AlManzomah, AlMohanad, Andale_Mono, Ani, AnjaliOldLipi, Arab, Arial, Arial_Black, Arial_Bold, Arial_Bold_Italic, Arial_Italic, BPG_Chveulebrivi, BPG_Chveulebrivi_Bold, BPG_Courier, BPG_Courier_Bold, BPG_Elite, BPG_Elite_Bold, BPG_Glaho, BPG_Glaho_Bold, BPG_Rioni, BPG_Rioni_Bold, BPG_Unicode_Standard, Baekmuk_Batang, Baekmuk_Batang_Patched, Baekmuk_Dotum, Baekmuk_Gulim, Baekmuk_Headline, Bangla, Bitstream_Vera_Sans, Bitstream_Vera_Sans_Bold, Bitstream_Vera_Sans_Bold_Oblique, Bitstream_Vera_Sans_Mono, Bitstream_Vera_Sans_Mono_Bold, Bitstream_Vera_Sans_Mono_Bold_Oblique, Bitstream_Vera_Sans_Mono_Oblique, Bitstream_Vera_Sans_Mono_Roman, Bitstream_Vera_Sans_Oblique, Bitstream_Vera_Sans_Roman, Bitstream_Vera_Serif, Bitstream_Vera_Serif_Bold, Bitstream_Vera_Serif_Roman, CaslonishFraxx, Century_Schoolbook_L, Century_Schoolbook_L_Bold, Century_Schoolbook_L_Bold_Italic, Century_Schoolbook_L_Italic, Century_Schoolbook_L_Roman, Chandas, Cloister_Black_Light, Comic_Sans_MS, Comic_Sans_MS_Bold, Cortoba, Courier_New, Courier_New_Bold, Courier_New_Bold_Italic, Courier_New_Italic, DejaVu_Sans, DejaVu_Sans_Bold, DejaVu_Sans_Bold_Oblique, DejaVu_Sans_Condensed, DejaVu_Sans_Condensed_Bold, DejaVu_Sans_Condensed_Bold_Oblique, DejaVu_Sans_Condensed_Oblique, DejaVu_Sans_Mono, DejaVu_Sans_Mono_Bold, DejaVu_Sans_Mono_Bold_Oblique, DejaVu_Sans_Mono_Oblique, DejaVu_Sans_Oblique, DejaVu_Sans_Ultra-Light, DejaVu_Serif, DejaVu_Serif_Bold, DejaVu_Serif_Bold_Italic, DejaVu_Serif_Bold_Oblique, DejaVu_Serif_Bold_Semi-Condensed, DejaVu_Serif_Condensed_Bold, DejaVu_Serif_Condensed_Bold_Italic, DejaVu_Serif_Condensed_Italic, DejaVu_Serif_Italic, DejaVu_Serif_Oblique, DejaVu_Serif_Semi-Condensed, Dimnah, Dustismo, Dustismo_Roman, Dustismo_Roman_Bold, Dustismo_Roman_Italic, Dustismo_Roman_Italic_Bold, Dyuthi, East_Syriac_Adiabene, East_Syriac_Ctesiphon, Electron, Estrangelo_Antioch, Estrangelo_Edessa, Estrangelo_Midyat, Estrangelo_Nisibin, Estrangelo_Quenneshrin, Estrangelo_Talada, Estrangelo_TurAbdin, FreeMono, FreeMono_Bold, FreeMono_Bold_Italic, FreeMono_Bold_Oblique, FreeMono_Italic, FreeMono_Oblique, FreeSans, FreeSans_Bold, FreeSans_Bold_Oblique, FreeSans_Oblique, FreeSerif, FreeSerif_Bold, FreeSerif_Bold_Italic, FreeSerif_Italic, Furat, Garuda, Garuda_Bold, Garuda_Bold_Oblique, Garuda_Oblique, GentiumAlt, GentiumAlt_Italic, Georgia, Georgia_Bold, Georgia_Bold_Italic, Georgia_Italic, Granada, Graph, Hani, Haramain, Hor, IPAGothic, IPAMincho, IPAPGothic, IPAPMincho, IPAUIGothic, Impact, Impact_Condensed, Jamrul, Jamrul_Semi-Expanded, Japan, Jet, Kalimati, Kalyani, Kayrawan, Kedage, Kedage_Bold, Kedage_Bold_Italic, Kedage_Italic, Khalid, Khmer_OS, Khmer_OS_Battambang, Khmer_OS_Bokor, Khmer_OS_Content, Khmer_OS_Fasthand, Khmer_OS_Freehand, Khmer_OS_Metal_Chrieng, Khmer_OS_Muol, Khmer_OS_Muol_Light, Khmer_OS_Muol_Pali, Khmer_OS_Siemreap, Khmer_OS_System, Kochi_Gothic, Kochi_Mincho, LKLUG, Lateef, Likhan, Linux_Biolinum_O, Linux_Biolinum_O_Bold, Linux_Libertine_O, Linux_Libertine_O_Bold, Linux_Libertine_O_Bold_Italic, Linux_Libertine_O_C, Linux_Libertine_O_Italic, Lohit_Assamese, Lohit_Bengali, Lohit_Gujarati, Lohit_Hindi, Lohit_Malayalam, Lohit_Oriya, Lohit_Punjabi, Lohit_Tamil, Lohit_Telugu, Loma, Loma_Bold, Loma_Bold_Oblique, Loma_Oblique, Lucida_Bright, Lucida_Bright_Italic, Lucida_Bright_Semi-Bold, Lucida_Bright_Semi-Bold_Italic, Lucida_Sans, Lucida_Sans_Oblique, Lucida_Sans_Semi-Bold, Lucida_Sans_Semi-Bold_Oblique, Lucida_Sans_Typewriter, Lucida_Sans_Typewriter_Bold, Lucida_Sans_Typewriter_Bold_Oblique, Mallige, Mallige_Bold, Mallige_Bold_Italic, Mallige_Italic, Mashq, Meera, Metal, Mitra_Mono, Monapo, Mukti_Narrow, Mukti_Narrow_Bold, Nada, Nagham, Nice, Norasi, Norasi_Bold, Norasi_Bold_Italic, Norasi_Bold_Oblique, Norasi_Italic, Norasi_Oblique, OpenSymbol, Ostorah, Padauk, Padauk_Bold, Petra, Phetsarath_OT, Pothana2000, Proclamate_Light, Purisa_Light, Rachana, Rachana_w01, RaghuMalayalam, Rehan, Rekha, Saab, Salem, Samanata, Samyak_Gujarati, Samyak_Oriya, Sazanami_Gothic, Sazanami_Mincho, Scheherazade, Serto_Batnan, Serto_Batnan_Bold, Serto_Jerusalem, Serto_Jerusalem_Bold, Serto_Jerusalem_Italic, Serto_Kharput, Serto_Malankara, Serto_Mardin, Serto_Mardin_Bold, Serto_Urhoy, Serto_Urhoy_Bold, Shado, Sharjah, TAMu_Kadambri, TAMu_Kalyani, TAMu_Maduram, TSCu_Comic, TSCu_Paranar, TSCu_Paranar_Bold, TSCu_Paranar_Italic, TSCu_Times, TakaoExGothic, TakaoExMincho, TakaoGothic, TakaoMincho, TakaoPGothic, TakaoPMincho, Tarablus, Tholoth, Tibetan_Machine_Uni, Times_New_Roman, Times_New_Roman_Bold, Times_New_Roman_Bold_Italic, Times_New_Roman_Italic, TlwgMono, TlwgMono_Bold, TlwgMono_Bold_Oblique, TlwgMono_Oblique, TlwgTypewriter, TlwgTypewriter_Bold, TlwgTypewriter_Bold_Oblique, TlwgTypewriter_Oblique, Trebuchet_MS, Trebuchet_MS_Bold, Trebuchet_MS_Bold_Italic, Trebuchet_MS_Italic, URW_Bookman_L, URW_Bookman_L_Bold, URW_Bookman_L_Bold_Italic, URW_Bookman_L_Italic, URW_Bookman_L_Light_Italic, UmePlus_Gothic, UmePlus_P_Gothic, UnBatang, UnBatang_Bold, UnDotum, UnDotum_Bold, UnifrakturMaguntia, Unikurd_Web, Uttara, VL_Gothic, VL_PGothic, Vemana2000, Verdana, Verdana_Bold, Verdana_Bold_Italic, Verdana_Italic, Walbaum-Fraktur, Webdings, WenQuanYi_Zen_Hei, Wyld, Wyld_Italic, aakar, batang, chandas1-1, chandas1-2, cheluvi, dotum, gargi, gulim, hline, ipag, ipagp, ipagui, ipam, ipamp, kalimati, kochi-gothic, kochi-gothic-subst, kochi-mincho, kochi-mincho-subst, lklug, lohit_bn, lohit_gu, lohit_hi, lohit_ml, lohit_or, lohit_pa, lohit_ta, lohit_te, monapo, ori1Uni, padmaa, padmaa_Bold, suruma -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/114f31b1-1c30-4ffe-a8d1-375c82e4cfc6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

