Dear Friends

I am using tesseract-3.02 for extracting text from scanned images.

As you know, tesseract could be integrated with the application. tesseract 
internally uses a number of training data files, config files and so on, 
which are installed in tessdata folder.

When I listed these files, I found that there are at least 29 files:

   1. eng.traineddata
   2. eng.cube.bigrams
   3. eng.cube.fold
   4. eng.cube.size
   5. eng.cube.nn
   6. eng.cube.params
   7. eng.cube.word-freq
   8. eng.tesseract_cube.nn
   9. eng.cube.lm
   10. osd.traineddata
   11. logfile
   12. api_config
   13. box.train
   14. box.train.stderr
   15. digits
   16. hocr
   17. inter
   18. linebox
   19. ambigs.train
   20. makebox
   21. rebox
   22. strokewidth
   23. unlv
   24. batch.nochop
   25. matdemo
   26. msdemo
   27. nobatch
   28. segdemo
   29. batch
   

Can you please guide me about my following queries:
1) Are all these files needed when tesseract is doing OCR extraction?
-- If not, what are the minimum mandatory files required by tesseract to 
work correctly?

2) Can we combine these mandatory files in one file and use it with 
tesseract without unpacking?
    -- I have a disk space constraint and also want to reduce the number of 
reads from the disk.

Many thanks in advance for your guidance and time.

Best Regards,
- ganesh

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to