This appears to be an issue with --find_fonts and/or --strip_unrenderable_words. The following command succeeds for me:
$ text2image --exposure=0 --font "Helvetica Neue Thin" --outputbase=eng.Helvetica_Neue_Thin.exp0 --text=/Users/ryan/source/tesseract/tesseract-ocr.langdata/eng/eng.training_text --leading=32 --char_spacing=0.0 --box_padding=0 Initializing fontconfig Rendered page 0 to file eng.Helvetica_Neue_Thin.exp0.tif Rendered page 1 to file eng.Helvetica_Neue_Thin.exp0.tif -Ryan On Tuesday, March 31, 2015 at 3:43:23 PM UTC-4, Philip Pearl wrote: > > Hi All > > I'm trying to train tesseract for the first time on my Mac. I'm running > text2image as follows, but it is crashing in Pango as the priv data on the > font is NULL. > > /usr/local/Cellar/tesseract/HEAD/bin//text2image --leading=32 > --fonts_dir=/Library/Fonts --box_padding=0 --strip_unrenderable_words > --char_spacing=0.0 --exposure=0 --find_fonts=true > --outputbase=/tmp/tesstrain/eng/eng.Helvetica_Neue_Thin.exp0 > --text=./tesslang/eng/eng.training_text > > Thread 0 Crashed:: Dispatch queue: com.apple.main-thread > > 0 libpangoft2-1.0.0.dylib 0x00000001090fad9e > pango_fc_font_get_glyph + 25 > > 1 text2image 0x000000010858bf58 > tesseract::PangoFontInfo::CanRenderString(char const*, int, > std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char> >, > std::__1::allocator<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> > > >*) const + 322 > > 2 text2image 0x000000010858d0ab > tesseract::FontUtils::SelectFont(char const*, int, > std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char> >, > std::__1::allocator<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, > std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char> >*, std::__1::vector<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> >, > std::__1::allocator<std::__1::basic_string<char, > std::__1::char_traits<char>, std::__1::allocator<char> > > >*) + 287 > > 3 text2image 0x0000000108592c06 > tesseract::StringRenderer::RenderAllFontsToImage(double, char const*, int, > std::__1::basic_string<char, std::__1::char_traits<char>, > std::__1::allocator<char> >*, Pix**) + 108 > > 4 text2image 0x0000000108584149 main + 2750 > > 5 libdyld.dylib 0x00007fff932315fd start + 1 > > > I installed from HEAD using homebrew and the instructions I found here > https://ryanfb.github.io/etc/2014/11/19/installing_tesseract_training_tools_on_mac_os_x.html > > > - Any ideas how to get around this crash? > - Am I crazy running this on my Mac? Would I be better off with a > Linux VM? > - Does training from fonts work or am I better off starting with > images (my data is analog HD screen captures of TV menus!)? I know the > font > the menus use. > > Thanks in advance for any help or advice you are able to give me. > > Phil > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f028b0c2-91ee-4686-bb6d-edd81282485e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.