On Sun, Oct 21, 2012 at 10:07:17PM +0200, zdenko podobny wrote: > I have no idea how it works on Mac.
I looked into this today actually, when rewriting the ReadMe. It looks like the homebrew package asks if you want to install all the language files, and if so it just unpacks each one to the tessdata directory. > There are no information about cube training, but there are cube files for > ara, > eng, fra, hin, ita, rus, spa already (and cube support within tesseract). I thought cube was mainly useful for scripts like Arabic and devanagari. Evidently not. Is there anywhere that I can read more about it yet? (source code doesn't count) > Personally I would prefer if tesseract/combine_tessdata can handle (un) > compression. I got this idea when I read article about using compression for > fast boot up of computers. Maybe this would improve init time of tesseract (if > it reduce disk I/O ;-)) and save storage space (IMO needed on mobile > phones)... Yes, that would be nice, I agree. Just linking with zlib should be quite easy. I doubt I'll have time to make it work, though. But I agree somebody should ;) And you're right, halving the disk io would significantly increase the startup time, likely. > And there is also idea about protecting language data file[1]. Umm, that doesn't make sense, really. Tesseract will have to decrypt them anyway, so the key has to be available, so anybody could decrypt them. But apart from the practicalities, when would that ever be a good idea? Being able to inspect and learn from other trainings is very useful (I know you and I have both used that to help our trainings,) and without the source files any old trainings can't be usefully maintained by a 3rd party if the original trainer goes away (another argument for sharing all box/tif files, btw, which I still wish was done more.) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

