On Fri, 12 Jan 2024, 14:08 Oliver Saintilien, <[email protected]>
wrote:
> Something else I tried was this
> const tesseract = require("node-tesseract-ocr")
>
> tesseract
> .recognize(`C:\\Users\\osain\\OneDrive\\Desktop\\1992 Spring\\
> Document_20240109_0014.jpg`, {
> lang: "eng",
> oem: 1,
> psm: 0,
> "tessdata-dir": "C:\\Program Files\\Tesseract-OCR\\tessdata"
> })
>
> Thats when I get the error about the Tessdata env var. I have pasted it
> below:
>
> Command failed: tesseract "C:\Users\osain\OneDrive\Desktop\1992
> Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3
> --tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
> Error opening data file C:\Program/eng.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to your
> "tessdata" directory.
>
Adding to Zdenko's answer: what you need to do is fix / patch
node-tesseract-ocr (or file a bug report there and see if someone else does
it for you; since this is open source I suggest fork+fix+pullreq at
node-tesseract-ocr instead ;-) ) where it then correctly converts paths
with spaces as specified in js config struct to operating system dependent
correctly escaped commandline arguments for tesseract executable that is
invoked by node-tesseract-ocr.
Quickest fix would be to wrap the --tessdata-dir path argument in double
quotes, which fixes most/your path issues on mswindows (as long as the path
itself is not adversarial, containing dquote of it's own).
In other words: currently node-tesseract-ocr produces this commandline, as
reported by you:
tesseract "C:\Users\osain\OneDrive\Desktop\1992
Spring\Document_20240109_0014.jpg" stdout -l eng --oem 1 --psm 3
--tessdata-dir C:\Program Files\Tesseract-OCR\tessdata
which is interpreted like this (extra newlines added to show the arguments
separated):
tesseract
"C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
stdout
-l eng
--oem 1
--psm 3
--tessdata-dir C:\Program
Files\Tesseract-OCR\tessdata
so tesseract receives this and gets a damaged path PLUS a surplus argument
it apparently ignored: "Files\Tesseract-OCR\tessdata".
Would SHOULD have been generated by node-tesseract-ocr is this (with extra
newlines again):
tesseract
"C:\Users\osain\OneDrive\Desktop\1992 Spring\Document_20240109_0014.jpg"
stdout
-l eng
--oem 1
--psm 3
--tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"
as was intended in the js code.
HTH,
Ger
>>>>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60frs9cGjyYwhvojUUAPpXxhGG2DeXAVzfinU7oSpVHPZtw%40mail.gmail.com.