Hi all,

I posted this question already in stackoverflow but no one seems to
have a hint at hand. Thats why I repost in this dedicated group:

I use Ghostscript to strip images from PDF files into jpg and run
Tesseract to save txt content like this:

- Ghostscript located in c:\engine\gs\
- Tesseract located in c:\engine\tesseract\
- web located pdf/jpg/txt dir = file/tmp/

Code:

$pathgs = "c:\\engine\\gs\\";
$pathtess = "c:\\engine\\tesseract\\";
$pathfile = "file/tmp/"

// Strip images
putenv("PATH=".$pathgs);
$exec = "gs -dNOPAUSE -sDEVICE=jpeg -r300 -sOutputFile=".
$pathfile."strip%d.jpg ".$pathfile."upload.pdf -q -c quit";
shell_exec($exec);

 // OCR
putenv("PATH=".$pathtess);
$exec = "tesseract.exe '".$pathfile."strip1.jpg' '".$pathfile."ocr' -l
eng";
exec($exec, $msg);
print_r($msg);
echo file_get_contents($pathfile."ocr.txt");

Stripping the image (its just 1 page) works fine, but Tesseract
echoes:
Array
  (
     [0] => Tesseract Open Source OCR Engine v3.01 with Leptonica
     [1] => Cannot open input file: 'file/tmp/strip1.jpg'
  )

and no ocr.txt file is generated, thus leading into a 'failed to open
stream' error in PHP.

- Copying strip1.jpg into c:/engine/tesseract/ folder and running
Tesseract from command (tesseract strip1.jpg ocr.txt -l eng) runs
without any issue.
- Replacing the putenv() quote by exec(c:/engine/tesseract/
tesseract ... ) returns the a.m. error
- I kept strip1.jpg in the Tesseract folder and ran exec(tesseract 'c:/
engine/tesseract/strip1.jpg' ... ) returns the a.m. error
- Leaving away the apostrophs around path/strip1.jpg returns an empty
array as message and does not create the ocr.txt file.
- writing the command directly into the exec() quote instead of using
$exec doesnt make the change.

What I wonder most is the issue that I have to put the file path into
apostrophs or I do not get any feedback from Tesseract at all. In all
examples I have seen on the net no apostrophs were needed. Maybe this
is the direction I have to head for a solution? Would be glad for any
help.

Thanks & brgds
David

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to