If you have a look at intproto.h, you'll see there is a similar limitation, bit it's much more complicated. Unfortunately I don't have an overview of what is possible yet, but I'm working on it. :) Just use normproto.h as a reference.
Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht Hilker: > > The manual "Training Tesseract 3" says: > > > Tesseract needs to know about different shapes of the same character by > having different fonts separated explicitly. > > This used to be limited to 32 fonts, but the limit has been raised to 64. > > It is set by the constant MAX_NUM_CONFIGS defined in intproto.h. > > Note that runtime is heavily dependent on the number of fonts provided, > and training more than 32 will result in a significant slow-down. > > > > I analyzed the number of fonts in eng.traineddata and I was very surprised > that there have been 358 fonts trained ! > get_fontinfo_table().size() returns 358 ! > > > Can anybody explain me this contradiction ? > > > > > Fonts in eng.traineddata: > > AR_PL_UKai_CN, > AR_PL_UKai_Patched, > AR_PL_UKai_TW, > AR_PL_UMing_CN_Light, > AR_PL_UMing_Patched_Light, > AR_PL_UMing_TW_MBE_Light, > Aboriginal_Sans, > Aboriginal_Sans_Bold_Italic, > Aboriginal_Sans_Italic, > Aboriginal_Serif, > Aboriginal_Serif_Bold, > Aboriginal_Serif_Bold_Italic, > Aboriginal_Serif_Italic, > Abyssinica_SIL, > AlArabiya, > AlBattar, > AlHor, > AlManzomah, > AlMohanad, > Andale_Mono, > Ani, > AnjaliOldLipi, > Arab, > Arial, > Arial_Black, > Arial_Bold, > Arial_Bold_Italic, > Arial_Italic, > BPG_Chveulebrivi, > BPG_Chveulebrivi_Bold, > BPG_Courier, > BPG_Courier_Bold, > BPG_Elite, > BPG_Elite_Bold, > BPG_Glaho, > BPG_Glaho_Bold, > BPG_Rioni, > BPG_Rioni_Bold, > BPG_Unicode_Standard, > Baekmuk_Batang, > Baekmuk_Batang_Patched, > Baekmuk_Dotum, > Baekmuk_Gulim, > Baekmuk_Headline, > Bangla, > Bitstream_Vera_Sans, > Bitstream_Vera_Sans_Bold, > Bitstream_Vera_Sans_Bold_Oblique, > Bitstream_Vera_Sans_Mono, > Bitstream_Vera_Sans_Mono_Bold, > Bitstream_Vera_Sans_Mono_Bold_Oblique, > Bitstream_Vera_Sans_Mono_Oblique, > Bitstream_Vera_Sans_Mono_Roman, > Bitstream_Vera_Sans_Oblique, > Bitstream_Vera_Sans_Roman, > Bitstream_Vera_Serif, > Bitstream_Vera_Serif_Bold, > Bitstream_Vera_Serif_Roman, > CaslonishFraxx, > Century_Schoolbook_L, > Century_Schoolbook_L_Bold, > Century_Schoolbook_L_Bold_Italic, > Century_Schoolbook_L_Italic, > Century_Schoolbook_L_Roman, > Chandas, > Cloister_Black_Light, > Comic_Sans_MS, > Comic_Sans_MS_Bold, > Cortoba, > Courier_New, > Courier_New_Bold, > Courier_New_Bold_Italic, > Courier_New_Italic, > DejaVu_Sans, > DejaVu_Sans_Bold, > DejaVu_Sans_Bold_Oblique, > DejaVu_Sans_Condensed, > DejaVu_Sans_Condensed_Bold, > DejaVu_Sans_Condensed_Bold_Oblique, > DejaVu_Sans_Condensed_Oblique, > DejaVu_Sans_Mono, > DejaVu_Sans_Mono_Bold, > DejaVu_Sans_Mono_Bold_Oblique, > DejaVu_Sans_Mono_Oblique, > DejaVu_Sans_Oblique, > DejaVu_Sans_Ultra-Light, > DejaVu_Serif, > DejaVu_Serif_Bold, > DejaVu_Serif_Bold_Italic, > DejaVu_Serif_Bold_Oblique, > DejaVu_Serif_Bold_Semi-Condensed, > DejaVu_Serif_Condensed_Bold, > DejaVu_Serif_Condensed_Bold_Italic, > DejaVu_Serif_Condensed_Italic, > DejaVu_Serif_Italic, > DejaVu_Serif_Oblique, > DejaVu_Serif_Semi-Condensed, > Dimnah, > Dustismo, > Dustismo_Roman, > Dustismo_Roman_Bold, > Dustismo_Roman_Italic, > Dustismo_Roman_Italic_Bold, > Dyuthi, > East_Syriac_Adiabene, > East_Syriac_Ctesiphon, > Electron, > Estrangelo_Antioch, > Estrangelo_Edessa, > Estrangelo_Midyat, > Estrangelo_Nisibin, > Estrangelo_Quenneshrin, > Estrangelo_Talada, > Estrangelo_TurAbdin, > FreeMono, > FreeMono_Bold, > FreeMono_Bold_Italic, > FreeMono_Bold_Oblique, > FreeMono_Italic, > FreeMono_Oblique, > FreeSans, > FreeSans_Bold, > FreeSans_Bold_Oblique, > FreeSans_Oblique, > FreeSerif, > FreeSerif_Bold, > FreeSerif_Bold_Italic, > FreeSerif_Italic, > Furat, > Garuda, > Garuda_Bold, > Garuda_Bold_Oblique, > Garuda_Oblique, > GentiumAlt, > GentiumAlt_Italic, > Georgia, > Georgia_Bold, > Georgia_Bold_Italic, > Georgia_Italic, > Granada, > Graph, > Hani, > Haramain, > Hor, > IPAGothic, > IPAMincho, > IPAPGothic, > IPAPMincho, > IPAUIGothic, > Impact, > Impact_Condensed, > Jamrul, > Jamrul_Semi-Expanded, > Japan, > Jet, > Kalimati, > Kalyani, > Kayrawan, > Kedage, > Kedage_Bold, > Kedage_Bold_Italic, > Kedage_Italic, > Khalid, > Khmer_OS, > Khmer_OS_Battambang, > Khmer_OS_Bokor, > Khmer_OS_Content, > Khmer_OS_Fasthand, > Khmer_OS_Freehand, > Khmer_OS_Metal_Chrieng, > Khmer_OS_Muol, > Khmer_OS_Muol_Light, > Khmer_OS_Muol_Pali, > Khmer_OS_Siemreap, > Khmer_OS_System, > Kochi_Gothic, > Kochi_Mincho, > LKLUG, > Lateef, > Likhan, > Linux_Biolinum_O, > Linux_Biolinum_O_Bold, > Linux_Libertine_O, > Linux_Libertine_O_Bold, > Linux_Libertine_O_Bold_Italic, > Linux_Libertine_O_C, > Linux_Libertine_O_Italic, > Lohit_Assamese, > Lohit_Bengali, > Lohit_Gujarati, > Lohit_Hindi, > Lohit_Malayalam, > Lohit_Oriya, > Lohit_Punjabi, > Lohit_Tamil, > Lohit_Telugu, > Loma, > Loma_Bold, > Loma_Bold_Oblique, > Loma_Oblique, > Lucida_Bright, > Lucida_Bright_Italic, > Lucida_Bright_Semi-Bold, > Lucida_Bright_Semi-Bold_Italic, > Lucida_Sans, > Lucida_Sans_Oblique, > Lucida_Sans_Semi-Bold, > Lucida_Sans_Semi-Bold_Oblique, > Lucida_Sans_Typewriter, > Lucida_Sans_Typewriter_Bold, > Lucida_Sans_Typewriter_Bold_Oblique, > Mallige, > Mallige_Bold, > Mallige_Bold_Italic, > Mallige_Italic, > Mashq, > Meera, > Metal, > Mitra_Mono, > Monapo, > Mukti_Narrow, > Mukti_Narrow_Bold, > Nada, > Nagham, > Nice, > Norasi, > Norasi_Bold, > Norasi_Bold_Italic, > Norasi_Bold_Oblique, > Norasi_Italic, > Norasi_Oblique, > OpenSymbol, > Ostorah, > Padauk, > Padauk_Bold, > Petra, > Phetsarath_OT, > Pothana2000, > Proclamate_Light, > Purisa_Light, > Rachana, > Rachana_w01, > RaghuMalayalam, > Rehan, > Rekha, > Saab, > Salem, > Samanata, > Samyak_Gujarati, > Samyak_Oriya, > Sazanami_Gothic, > Sazanami_Mincho, > Scheherazade, > Serto_Batnan, > Serto_Batnan_Bold, > Serto_Jerusalem, > Serto_Jerusalem_Bold, > Serto_Jerusalem_Italic, > Serto_Kharput, > Serto_Malankara, > Serto_Mardin, > Serto_Mardin_Bold, > Serto_Urhoy, > Serto_Urhoy_Bold, > Shado, > Sharjah, > TAMu_Kadambri, > TAMu_Kalyani, > TAMu_Maduram, > TSCu_Comic, > TSCu_Paranar, > TSCu_Paranar_Bold, > TSCu_Paranar_Italic, > TSCu_Times, > TakaoExGothic, > TakaoExMincho, > TakaoGothic, > TakaoMincho, > TakaoPGothic, > TakaoPMincho, > Tarablus, > Tholoth, > Tibetan_Machine_Uni, > Times_New_Roman, > Times_New_Roman_Bold, > Times_New_Roman_Bold_Italic, > Times_New_Roman_Italic, > TlwgMono, > TlwgMono_Bold, > TlwgMono_Bold_Oblique, > TlwgMono_Oblique, > TlwgTypewriter, > TlwgTypewriter_Bold, > TlwgTypewriter_Bold_Oblique, > TlwgTypewriter_Oblique, > Trebuchet_MS, > Trebuchet_MS_Bold, > Trebuchet_MS_Bold_Italic, > Trebuchet_MS_Italic, > URW_Bookman_L, > URW_Bookman_L_Bold, > URW_Bookman_L_Bold_Italic, > URW_Bookman_L_Italic, > URW_Bookman_L_Light_Italic, > UmePlus_Gothic, > UmePlus_P_Gothic, > UnBatang, > UnBatang_Bold, > UnDotum, > UnDotum_Bold, > UnifrakturMaguntia, > Unikurd_Web, > Uttara, > VL_Gothic, > VL_PGothic, > Vemana2000, > Verdana, > Verdana_Bold, > Verdana_Bold_Italic, > Verdana_Italic, > Walbaum-Fraktur, > Webdings, > WenQuanYi_Zen_Hei, > Wyld, > Wyld_Italic, > aakar, > batang, > chandas1-1, > chandas1-2, > cheluvi, > dotum, > gargi, > gulim, > hline, > ipag, > ipagp, > ipagui, > ipam, > ipamp, > kalimati, > kochi-gothic, > kochi-gothic-subst, > kochi-mincho, > kochi-mincho-subst, > lklug, > lohit_bn, > lohit_gu, > lohit_hi, > lohit_ml, > lohit_or, > lohit_pa, > lohit_ta, > lohit_te, > monapo, > ori1Uni, > padmaa, > padmaa_Bold, > suruma > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bee86d37-9e63-4d76-be78-345b8ed7f931%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

