Also, to answer your other questions:

   - There appear to be some other issues with Pango/Cairo rendering under 
   OS X which may impact the training process, as a result and for general 
   replicability I now use a Dockerized Linux environment to do Tesseract 
   training on my Mac: https://github.com/ryanfb/tesseract_latinocr_docker
   - Training from fonts works surprisingly well, but if there are 
   significant artifacts introduced by your pipeline/capture process, you may 
   get better accuracy with a manual box/train against images.

-Ryan

On Tuesday, March 31, 2015 at 3:43:23 PM UTC-4, Philip Pearl wrote:
>
> Hi All
>
> I'm trying to train tesseract for the first time on my Mac.  I'm running 
> text2image as follows, but it is crashing in Pango as the priv data on the 
> font is NULL.
>
> /usr/local/Cellar/tesseract/HEAD/bin//text2image --leading=32 
> --fonts_dir=/Library/Fonts --box_padding=0 --strip_unrenderable_words 
> --char_spacing=0.0 --exposure=0 --find_fonts=true 
> --outputbase=/tmp/tesstrain/eng/eng.Helvetica_Neue_Thin.exp0 
> --text=./tesslang/eng/eng.training_text
>
> Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
>
> 0   libpangoft2-1.0.0.dylib             0x00000001090fad9e 
> pango_fc_font_get_glyph + 25
>
> 1   text2image                          0x000000010858bf58 
> tesseract::PangoFontInfo::CanRenderString(char const*, int, 
> std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> >, 
> std::__1::allocator<std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > > >*) const + 322
>
> 2   text2image                          0x000000010858d0ab 
> tesseract::FontUtils::SelectFont(char const*, int, 
> std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> >, 
> std::__1::allocator<std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> >*, std::__1::vector<std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> >, 
> std::__1::allocator<std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > > >*) + 287
>
> 3   text2image                          0x0000000108592c06 
> tesseract::StringRenderer::RenderAllFontsToImage(double, char const*, int, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> >*, Pix**) + 108
>
> 4   text2image                          0x0000000108584149 main + 2750
>
> 5   libdyld.dylib                       0x00007fff932315fd start + 1
>
>
> I installed from HEAD using homebrew and the instructions I found here 
> https://ryanfb.github.io/etc/2014/11/19/installing_tesseract_training_tools_on_mac_os_x.html
>
>
>    - Any ideas how to get around this crash?
>    - Am I crazy running this on my Mac?  Would I be better off with a 
>    Linux VM?
>    - Does training from fonts work or am I better off starting with 
>    images (my data is analog HD screen captures of TV menus!)? I know the 
> font 
>    the menus use.
>
> Thanks in advance for any help or advice you are able to give me.
>
> Phil
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ef699c4e-79ff-4524-9ee4-a76bda1a2ced%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to