On Thu, Apr 19, 2012 at 12:59 PM, droehn <[email protected]> wrote:

> Hi all,
>
> I posted this question already in stackoverflow but no one seems to
> have a hint at hand. Thats why I repost in this dedicated group:
>
> I use Ghostscript to strip images from PDF files into jpg and run
> Tesseract to save txt content like this:
>
> - Ghostscript located in c:\engine\gs\
> - Tesseract located in c:\engine\tesseract\
> - web located pdf/jpg/txt dir = file/tmp/
>
> Code:
>
> $pathgs = "c:\\engine\\gs\\";
> $pathtess = "c:\\engine\\tesseract\\";
> $pathfile = "file/tmp/"
>
> // Strip images
> putenv("PATH=".$pathgs);
> $exec = "gs -dNOPAUSE -sDEVICE=jpeg -r300 -sOutputFile=".
> $pathfile."strip%d.jpg ".$pathfile."upload.pdf -q -c quit";
> shell_exec($exec);
>
>  // OCR
> putenv("PATH=".$pathtess);
> $exec = "tesseract.exe '".$pathfile."strip1.jpg' '".$pathfile."ocr' -l
> eng";
> exec($exec, $msg);
> print_r($msg);
> echo file_get_contents($pathfile."ocr.txt");
>
> Stripping the image (its just 1 page) works fine, but Tesseract
> echoes:
> Array
>  (
>     [0] => Tesseract Open Source OCR Engine v3.01 with Leptonica
>     [1] => Cannot open input file: 'file/tmp/strip1.jpg'
>  )
>
> and no ocr.txt file is generated, thus leading into a 'failed to open
> stream' error in PHP.
>
> - Copying strip1.jpg into c:/engine/tesseract/ folder and running
> Tesseract from command (tesseract strip1.jpg ocr.txt -l eng) runs
> without any issue.
> - Replacing the putenv() quote by exec(c:/engine/tesseract/
> tesseract ... ) returns the a.m. error
> - I kept strip1.jpg in the Tesseract folder and ran exec(tesseract 'c:/
> engine/tesseract/strip1.jpg' ... ) returns the a.m. error
> - Leaving away the apostrophs around path/strip1.jpg returns an empty
> array as message and does not create the ocr.txt file.
> - writing the command directly into the exec() quote instead of using
> $exec doesnt make the change.
>
> What I wonder most is the issue that I have to put the file path into
> apostrophs or I do not get any feedback from Tesseract at all. In all
> examples I have seen on the net no apostrophs were needed. Maybe this
> is the direction I have to head for a solution? Would be glad for any
> help.
>
>
First of all: try to change " and ' in exec statement. AFAIK
windows accept only " in file path (' is considered as a part of filename -
tested on Windows XP)

If it does not help then try to replace "tesseract" with "dir". Do you
get dir error message (e.g. File Not Found) or standard dir information
about file?

-- 
Zdenko

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to