On Wednesday, August 17, 2016 at 9:37:12 AM UTC-4, zdenop wrote: > > If there is other solution how to separate "must" part of the project with > "optional" data on github.com, please share it. >
The issue is that this separation creates the problem I mentioned, that you can't simply clone the github repository and use it directly as your tessdata dir. Combining the two (optional and "must", as you say) would mean that some people would probably delete files from their clone to keep disk-space usage down. At least that's what I did. My own process is to install tesseract via homebrew. That gets me a minimal set-up WRT the trained data files and means that I get updated upon major releases that make it to homebrew. Then I use the data files from github. This means that when tesseract gets updated via homebrew, I have to recreate the symlinks. Not a big deal, but not nothing either. So it's a trade-off. Some people would likely modify their set-up in either case, either to copy or link files as now, or to delete them. My current thinking is that the latter would be preferable for me, but I recognize that not everyone will agree with that. I assume it's possible to have an installation via homebrew (or whatever) that ignores the "extra" data files, or possibly two separate installations, a minimal and a full one. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/08a84f40-bbbb-41d5-8456-6ea8c0252508%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

