[tesseract-ocr] How to generate multiple teesedit_write_images output

2018-07-02 Thread Junye Li
Hi there, I want to see the actual input images processed by tesseract usingthe command -c tesseract and I used tessedit_write_images=TRUE. However, when I pass multi-layer (mutiple pages) .tiff image to tesseract the output tessinput.tif image only contains one layer, which is the last

[tesseract-ocr] Re: Tesseract Latest version

2018-11-25 Thread Junye Li
I believe the latest version is Tesseract 4.0.0 released on 29 Oct (as opposed to the rc3 version you have). Here's the update history: https://github.com/UB-Mannheim/tesseract/wiki Cheers On Saturday, 3 November 2018 03:04:38 UTC+11, Nikhil Kumar wrote: > > Hello. > I am using tesseract

[tesseract-ocr] Re: Tesseract training has an upper limit on the use of cpu?Is the more cpu, the faster the training?

2018-11-25 Thread Junye Li
Hi bruce, Hardware requirements can be found here: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#hardware-software-requirements. Tesseract uses 4 cores/threads (if your CPU supports hyperthread) at most. I had the training running on a 40 core workstation and it

[tesseract-ocr] Re: Tesseract training has an upper limit on the use of cpu?Is the more cpu, the faster the training?

2018-11-27 Thread Junye Li
I don't think that would be the case unless your training text is few hundred megabytes in size... I am running Tesseract on Ubuntu 18.04 and based a very quick test it turned out Tesseract on Ubuntu performed better than on Windows in terms of agreement accuracy (I'm training it for