Hi Ryan 

Thanks very much for such a useful answer!  I'm building your docker 
container as I type and I'll try with font training when its built.

I tried looking at training with boxes and images, but it complained about 
a good number of my boxes - saying it couldn't detect blobs within them. 
 I'm guessing my problem is that I don't have good separation of 
characters, so I plan to look at whether I can just remove those boxes or 
whether edit the images to remove some characters.

Phil

On Wednesday, 1 April 2015 17:59:35 UTC+1, Ryan Baumann wrote:
>
> Also, to answer your other questions:
>
>
>    - There appear to be some other issues with Pango/Cairo rendering 
>    under OS X which may impact the training process, as a result and for 
>    general replicability I now use a Dockerized Linux environment to do 
>    Tesseract training on my Mac: 
>    https://github.com/ryanfb/tesseract_latinocr_docker
>    - Training from fonts works surprisingly well, but if there are 
>    significant artifacts introduced by your pipeline/capture process, you may 
>    get better accuracy with a manual box/train against images.
>
> -Ryan
>
> On Tuesday, March 31, 2015 at 3:43:23 PM UTC-4, Philip Pearl wrote:
>>
>> Hi All
>>
>> I'm trying to train tesseract for the first time on my Mac.  I'm running 
>> text2image as follows, but it is crashing in Pango as the priv data on the 
>> font is NULL.
>>
>> /usr/local/Cellar/tesseract/HEAD/bin//text2image --leading=32 
>> --fonts_dir=/Library/Fonts --box_padding=0 --strip_unrenderable_words 
>> --char_spacing=0.0 --exposure=0 --find_fonts=true 
>> --outputbase=/tmp/tesstrain/eng/eng.Helvetica_Neue_Thin.exp0 
>> --text=./tesslang/eng/eng.training_text
>>
>> Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
>>
>> 0   libpangoft2-1.0.0.dylib             0x00000001090fad9e 
>> pango_fc_font_get_glyph + 25
>>
>> 1   text2image                          0x000000010858bf58 
>> tesseract::PangoFontInfo::CanRenderString(char const*, int, 
>> std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, 
>> std::__1::allocator<char> >, 
>> std::__1::allocator<std::__1::basic_string<char, 
>> std::__1::char_traits<char>, std::__1::allocator<char> > > >*) const + 322
>>
>> 2   text2image                          0x000000010858d0ab 
>> tesseract::FontUtils::SelectFont(char const*, int, 
>> std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, 
>> std::__1::allocator<char> >, 
>> std::__1::allocator<std::__1::basic_string<char, 
>> std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, 
>> std::__1::basic_string<char, std::__1::char_traits<char>, 
>> std::__1::allocator<char> >*, std::__1::vector<std::__1::basic_string<char, 
>> std::__1::char_traits<char>, std::__1::allocator<char> >, 
>> std::__1::allocator<std::__1::basic_string<char, 
>> std::__1::char_traits<char>, std::__1::allocator<char> > > >*) + 287
>>
>> 3   text2image                          0x0000000108592c06 
>> tesseract::StringRenderer::RenderAllFontsToImage(double, char const*, int, 
>> std::__1::basic_string<char, std::__1::char_traits<char>, 
>> std::__1::allocator<char> >*, Pix**) + 108
>>
>> 4   text2image                          0x0000000108584149 main + 2750
>>
>> 5   libdyld.dylib                       0x00007fff932315fd start + 1
>>
>>
>> I installed from HEAD using homebrew and the instructions I found here 
>> https://ryanfb.github.io/etc/2014/11/19/installing_tesseract_training_tools_on_mac_os_x.html
>>
>>
>>    - Any ideas how to get around this crash?
>>    - Am I crazy running this on my Mac?  Would I be better off with a 
>>    Linux VM?
>>    - Does training from fonts work or am I better off starting with 
>>    images (my data is analog HD screen captures of TV menus!)? I know the 
>> font 
>>    the menus use.
>>
>> Thanks in advance for any help or advice you are able to give me.
>>
>> Phil
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/86143830-79d5-4305-be5f-3ac58dfb52b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to