[tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Jeremiah
re it can find traineddata file. > > Regards > PD > > On Wednesday, March 11, 2020 at 1:10:13 AM UTC+5:30, Jeremiah wrote: >> >> I am getting this error when running some userbot java code on my Win10 >> machine which utilizes tesseract for extracting words from th

[tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Jeremiah
Yes, I've tried both C:\Program Files\Tesseract-OCR and C:\Program Files\Tesseract-OCR\tessdata and neither one work for me. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

Re: [tesseract-ocr] Re: Failed loading language 'eng'

2020-03-11 Thread Jeremiah
So I did download the latest version of the trained data file and tried but it didn't work. In the actual Java code a Tesseract object isn't ever created from what I can find, what the bots do is create a Region in Sikulix which then calls collectWordsText(). This is the code for reference.

Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-04-11 Thread Jeremiah
he PDF (from simple [type 1] to composite [type 0]). However, I am already working on implementing this as it is required to support non-Latin languages, so it will probably be possible to add characters outside of the Windows-1252 set at some point in the next month. -Jeremiah On Wednesday, Ap

Re: [tesseract-ocr] Re: Textbook-like format. Correcting improperly recognized text

2024-05-01 Thread Jeremiah
way, even if it fixes everything else) I do have questions about > tweaking the command as well, just haven't asked them yet > > On Mon, Apr 29, 2024, 12:36 Jeremiah wrote: > >> Regarding proofreading with Scribe OCR <https://scribeocr.com/>, it is >> definitely po

[tesseract-ocr] Re: Tesseract fine tuning questions

2024-05-11 Thread Jeremiah
I don't know the answer to most of these questions, however one thing I noticed in your question was the addition of rotation within the training data for better performance on scanned documents. This may imply that the scanned documents being fed to Tesseract are also rotated. Tesseract

[tesseract-ocr] Re: Textbook-like format. Correcting improperly recognized text

2024-04-29 Thread Jeremiah
Regarding proofreading with Scribe OCR , it is definitely possible to zoom in. The controls are virtually identical to popular document viewer programs like Acrobat. You can zoom in on the current location of the mouse using Control + Mouse Wheel, scroll using the

Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-03-31 Thread Jeremiah
wrote: > Hello Jeremiah, > > this looks very interesting and nice app. Any instructions for > installation? > > I just downloaded code from GH but recognizing text doesn't work for me: > > [image: image.png] > > BR, > > > Zdenko > > > so 30. 3. 20

[tesseract-ocr] Re: Is there a good way to change the recognition rate for such images?

2024-04-05 Thread Jeremiah
I do not believe training would have any impact on whether or not the column layout is correctly identified during the page segmentation step. I have similarly experienced the issue with single-digit columns being misidentified as vertical text when running with PSMs that use automatic page

[tesseract-ocr] Re: Recognition when font is known

2024-04-07 Thread Jeremiah
Cropping the image to only include the relevant area can significantly improve performance in cases where recognition was poor due to image processing or layout analysis failing. An indicator that this is happening is if words are missing entirely from the final output (rather than being

Re: [tesseract-ocr] Re: Manual review and correction for characters outside of the Latin-1 character set

2024-06-09 Thread Jeremiah
OCR allows for some preprocessing steps that improve recognition. 1. Currently the only supported preprocessing steps are auto-rotate and upscaling, and the only step turned on by default is auto-rotate. 1. Additional control of pre-processing could be added. -Jeremiah On Su