Alas, MATLAB wraps tesseract with its ocr() function and doesn't seem to allow passing parameters related to dictionary words to the underlying function. Perhaps I'll have to bypass their implementation and call a separate installation of tesseract directly.
On Friday, February 2, 2018 at 3:49:56 AM UTC-6, James Q wrote: > > Assuming you are using eng.traineddata - have you tried using it with the > dictionary off or just using osd.traineddata ? > > On Friday, February 2, 2018 at 8:56:23 AM UTC, Scott Stekel wrote: >> >> In the attached images (original and preprocessed before OCR), I have >> some lines of text which include the following: >> >> >> <https://lh3.googleusercontent.com/-bEabWrR7PoQ/WnN5U01xU6I/AAAAAAAB2-A/vhcV6AUqqDgd440kr8kP-Ko5NUM-WU2dACLcBGAs/s1600/extracted.png> >> >> S/A 2/2 >> Map G/1 >> >> Using tesseract 3.02 (under the covers of MATLAB R2017b), when this image >> is analyzed as a block, I get inconsistent recognition of the slashes: >> >> SIA 2/2 >> Map GI1 >> >> I find it interesting that the first slash in S/A is interpreted >> differently from the slash in 2/2. >> >> Any suggestions for how to get the slashes to be recognized correctly >> everywhere? >> >> Thank you. >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to firstname.lastname@example.org. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fc909d53-038f-4d61-9326-0f038eed5b8a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.