On Sun, Oct 21, 2012 at 10:07:17PM +0200, zdenko podobny wrote:
> I have no idea how it works on Mac.

I looked into this today actually, when rewriting the ReadMe. It
looks like the homebrew package asks if you want to install all the
language files, and if so it just unpacks each one to the tessdata
directory.

> There are no information about cube training, but there are cube files for 
> ara,
> eng, fra, hin, ita, rus, spa already (and cube support within tesseract).

I thought cube was mainly useful for scripts like Arabic and
devanagari. Evidently not. Is there anywhere that I can read more
about it yet? (source code doesn't count)
 
> Personally I would prefer if tesseract/combine_tessdata can handle (un)
> compression. I got this idea when I read article about using compression for
> fast boot up of computers. Maybe this would improve init time of tesseract (if
> it reduce disk I/O ;-))  and save storage space (IMO needed on mobile
> phones)...

Yes, that would be nice, I agree. Just linking with zlib should be
quite easy. I doubt I'll have time to make it work, though. But I
agree somebody should ;) And you're right, halving the disk io would
significantly increase the startup time, likely.

> And there is also idea about protecting language data file[1].

Umm, that doesn't make sense, really. Tesseract will have to decrypt
them anyway, so the key has to be available, so anybody could
decrypt them.

But apart from the practicalities, when would that ever be a good
idea? Being able to inspect and learn from other trainings is very
useful (I know you and I have both used that to help our trainings,)
and without the source files any old trainings can't be usefully
maintained by a 3rd party if the original trainer goes away (another
argument for sharing all box/tif files, btw, which I still wish was
done more.)

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to