The manual "Training Tesseract 3" says:

> Tesseract needs to know about different shapes of the same character by 
having different fonts separated explicitly. 
> This used to be limited to 32 fonts, but the limit has been raised to 64.
> It is set by the constant MAX_NUM_CONFIGS defined in intproto.h.
> Note that runtime is heavily dependent on the number of fonts provided, 
and training more than 32 will result in a significant slow-down. 



I analyzed the number of fonts in eng.traineddata and I was very surprised 
that there have been 358 fonts trained !
get_fontinfo_table().size() returns 358 !


Can anybody explain me this contradiction ?




Fonts in eng.traineddata:

 AR_PL_UKai_CN,
 AR_PL_UKai_Patched,
 AR_PL_UKai_TW,
 AR_PL_UMing_CN_Light,
 AR_PL_UMing_Patched_Light,
 AR_PL_UMing_TW_MBE_Light,
 Aboriginal_Sans,
 Aboriginal_Sans_Bold_Italic,
 Aboriginal_Sans_Italic,
 Aboriginal_Serif,
 Aboriginal_Serif_Bold,
 Aboriginal_Serif_Bold_Italic,
 Aboriginal_Serif_Italic,
 Abyssinica_SIL,
 AlArabiya,
 AlBattar,
 AlHor,
 AlManzomah,
 AlMohanad,
 Andale_Mono,
 Ani,
 AnjaliOldLipi,
 Arab,
 Arial,
 Arial_Black,
 Arial_Bold,
 Arial_Bold_Italic,
 Arial_Italic,
 BPG_Chveulebrivi,
 BPG_Chveulebrivi_Bold,
 BPG_Courier,
 BPG_Courier_Bold,
 BPG_Elite,
 BPG_Elite_Bold,
 BPG_Glaho,
 BPG_Glaho_Bold,
 BPG_Rioni,
 BPG_Rioni_Bold,
 BPG_Unicode_Standard,
 Baekmuk_Batang,
 Baekmuk_Batang_Patched,
 Baekmuk_Dotum,
 Baekmuk_Gulim,
 Baekmuk_Headline,
 Bangla,
 Bitstream_Vera_Sans,
 Bitstream_Vera_Sans_Bold,
 Bitstream_Vera_Sans_Bold_Oblique,
 Bitstream_Vera_Sans_Mono,
 Bitstream_Vera_Sans_Mono_Bold,
 Bitstream_Vera_Sans_Mono_Bold_Oblique,
 Bitstream_Vera_Sans_Mono_Oblique,
 Bitstream_Vera_Sans_Mono_Roman,
 Bitstream_Vera_Sans_Oblique,
 Bitstream_Vera_Sans_Roman,
 Bitstream_Vera_Serif,
 Bitstream_Vera_Serif_Bold,
 Bitstream_Vera_Serif_Roman,
 CaslonishFraxx,
 Century_Schoolbook_L,
 Century_Schoolbook_L_Bold,
 Century_Schoolbook_L_Bold_Italic,
 Century_Schoolbook_L_Italic,
 Century_Schoolbook_L_Roman,
 Chandas,
 Cloister_Black_Light,
 Comic_Sans_MS,
 Comic_Sans_MS_Bold,
 Cortoba,
 Courier_New,
 Courier_New_Bold,
 Courier_New_Bold_Italic,
 Courier_New_Italic,
 DejaVu_Sans,
 DejaVu_Sans_Bold,
 DejaVu_Sans_Bold_Oblique,
 DejaVu_Sans_Condensed,
 DejaVu_Sans_Condensed_Bold,
 DejaVu_Sans_Condensed_Bold_Oblique,
 DejaVu_Sans_Condensed_Oblique,
 DejaVu_Sans_Mono,
 DejaVu_Sans_Mono_Bold,
 DejaVu_Sans_Mono_Bold_Oblique,
 DejaVu_Sans_Mono_Oblique,
 DejaVu_Sans_Oblique,
 DejaVu_Sans_Ultra-Light,
 DejaVu_Serif,
 DejaVu_Serif_Bold,
 DejaVu_Serif_Bold_Italic,
 DejaVu_Serif_Bold_Oblique,
 DejaVu_Serif_Bold_Semi-Condensed,
 DejaVu_Serif_Condensed_Bold,
 DejaVu_Serif_Condensed_Bold_Italic,
 DejaVu_Serif_Condensed_Italic,
 DejaVu_Serif_Italic,
 DejaVu_Serif_Oblique,
 DejaVu_Serif_Semi-Condensed,
 Dimnah,
 Dustismo,
 Dustismo_Roman,
 Dustismo_Roman_Bold,
 Dustismo_Roman_Italic,
 Dustismo_Roman_Italic_Bold,
 Dyuthi,
 East_Syriac_Adiabene,
 East_Syriac_Ctesiphon,
 Electron,
 Estrangelo_Antioch,
 Estrangelo_Edessa,
 Estrangelo_Midyat,
 Estrangelo_Nisibin,
 Estrangelo_Quenneshrin,
 Estrangelo_Talada,
 Estrangelo_TurAbdin,
 FreeMono,
 FreeMono_Bold,
 FreeMono_Bold_Italic,
 FreeMono_Bold_Oblique,
 FreeMono_Italic,
 FreeMono_Oblique,
 FreeSans,
 FreeSans_Bold,
 FreeSans_Bold_Oblique,
 FreeSans_Oblique,
 FreeSerif,
 FreeSerif_Bold,
 FreeSerif_Bold_Italic,
 FreeSerif_Italic,
 Furat,
 Garuda,
 Garuda_Bold,
 Garuda_Bold_Oblique,
 Garuda_Oblique,
 GentiumAlt,
 GentiumAlt_Italic,
 Georgia,
 Georgia_Bold,
 Georgia_Bold_Italic,
 Georgia_Italic,
 Granada,
 Graph,
 Hani,
 Haramain,
 Hor,
 IPAGothic,
 IPAMincho,
 IPAPGothic,
 IPAPMincho,
 IPAUIGothic,
 Impact,
 Impact_Condensed,
 Jamrul,
 Jamrul_Semi-Expanded,
 Japan,
 Jet,
 Kalimati,
 Kalyani,
 Kayrawan,
 Kedage,
 Kedage_Bold,
 Kedage_Bold_Italic,
 Kedage_Italic,
 Khalid,
 Khmer_OS,
 Khmer_OS_Battambang,
 Khmer_OS_Bokor,
 Khmer_OS_Content,
 Khmer_OS_Fasthand,
 Khmer_OS_Freehand,
 Khmer_OS_Metal_Chrieng,
 Khmer_OS_Muol,
 Khmer_OS_Muol_Light,
 Khmer_OS_Muol_Pali,
 Khmer_OS_Siemreap,
 Khmer_OS_System,
 Kochi_Gothic,
 Kochi_Mincho,
 LKLUG,
 Lateef,
 Likhan,
 Linux_Biolinum_O,
 Linux_Biolinum_O_Bold,
 Linux_Libertine_O,
 Linux_Libertine_O_Bold,
 Linux_Libertine_O_Bold_Italic,
 Linux_Libertine_O_C,
 Linux_Libertine_O_Italic,
 Lohit_Assamese,
 Lohit_Bengali,
 Lohit_Gujarati,
 Lohit_Hindi,
 Lohit_Malayalam,
 Lohit_Oriya,
 Lohit_Punjabi,
 Lohit_Tamil,
 Lohit_Telugu,
 Loma,
 Loma_Bold,
 Loma_Bold_Oblique,
 Loma_Oblique,
 Lucida_Bright,
 Lucida_Bright_Italic,
 Lucida_Bright_Semi-Bold,
 Lucida_Bright_Semi-Bold_Italic,
 Lucida_Sans,
 Lucida_Sans_Oblique,
 Lucida_Sans_Semi-Bold,
 Lucida_Sans_Semi-Bold_Oblique,
 Lucida_Sans_Typewriter,
 Lucida_Sans_Typewriter_Bold,
 Lucida_Sans_Typewriter_Bold_Oblique,
 Mallige,
 Mallige_Bold,
 Mallige_Bold_Italic,
 Mallige_Italic,
 Mashq,
 Meera,
 Metal,
 Mitra_Mono,
 Monapo,
 Mukti_Narrow,
 Mukti_Narrow_Bold,
 Nada,
 Nagham,
 Nice,
 Norasi,
 Norasi_Bold,
 Norasi_Bold_Italic,
 Norasi_Bold_Oblique,
 Norasi_Italic,
 Norasi_Oblique,
 OpenSymbol,
 Ostorah,
 Padauk,
 Padauk_Bold,
 Petra,
 Phetsarath_OT,
 Pothana2000,
 Proclamate_Light,
 Purisa_Light,
 Rachana,
 Rachana_w01,
 RaghuMalayalam,
 Rehan,
 Rekha,
 Saab,
 Salem,
 Samanata,
 Samyak_Gujarati,
 Samyak_Oriya,
 Sazanami_Gothic,
 Sazanami_Mincho,
 Scheherazade,
 Serto_Batnan,
 Serto_Batnan_Bold,
 Serto_Jerusalem,
 Serto_Jerusalem_Bold,
 Serto_Jerusalem_Italic,
 Serto_Kharput,
 Serto_Malankara,
 Serto_Mardin,
 Serto_Mardin_Bold,
 Serto_Urhoy,
 Serto_Urhoy_Bold,
 Shado,
 Sharjah,
 TAMu_Kadambri,
 TAMu_Kalyani,
 TAMu_Maduram,
 TSCu_Comic,
 TSCu_Paranar,
 TSCu_Paranar_Bold,
 TSCu_Paranar_Italic,
 TSCu_Times,
 TakaoExGothic,
 TakaoExMincho,
 TakaoGothic,
 TakaoMincho,
 TakaoPGothic,
 TakaoPMincho,
 Tarablus,
 Tholoth,
 Tibetan_Machine_Uni,
 Times_New_Roman,
 Times_New_Roman_Bold,
 Times_New_Roman_Bold_Italic,
 Times_New_Roman_Italic,
 TlwgMono,
 TlwgMono_Bold,
 TlwgMono_Bold_Oblique,
 TlwgMono_Oblique,
 TlwgTypewriter,
 TlwgTypewriter_Bold,
 TlwgTypewriter_Bold_Oblique,
 TlwgTypewriter_Oblique,
 Trebuchet_MS,
 Trebuchet_MS_Bold,
 Trebuchet_MS_Bold_Italic,
 Trebuchet_MS_Italic,
 URW_Bookman_L,
 URW_Bookman_L_Bold,
 URW_Bookman_L_Bold_Italic,
 URW_Bookman_L_Italic,
 URW_Bookman_L_Light_Italic,
 UmePlus_Gothic,
 UmePlus_P_Gothic,
 UnBatang,
 UnBatang_Bold,
 UnDotum,
 UnDotum_Bold,
 UnifrakturMaguntia,
 Unikurd_Web,
 Uttara,
 VL_Gothic,
 VL_PGothic,
 Vemana2000,
 Verdana,
 Verdana_Bold,
 Verdana_Bold_Italic,
 Verdana_Italic,
 Walbaum-Fraktur,
 Webdings,
 WenQuanYi_Zen_Hei,
 Wyld,
 Wyld_Italic,
 aakar,
 batang,
 chandas1-1,
 chandas1-2,
 cheluvi,
 dotum,
 gargi,
 gulim,
 hline,
 ipag,
 ipagp,
 ipagui,
 ipam,
 ipamp,
 kalimati,
 kochi-gothic,
 kochi-gothic-subst,
 kochi-mincho,
 kochi-mincho-subst,
 lklug,
 lohit_bn,
 lohit_gu,
 lohit_hi,
 lohit_ml,
 lohit_or,
 lohit_pa,
 lohit_ta,
 lohit_te,
 monapo,
 ori1Uni,
 padmaa,
 padmaa_Bold,
 suruma

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/114f31b1-1c30-4ffe-a8d1-375c82e4cfc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to