Sorry, you were not saying this, I mixed some stuff up when reading up on the
issue this morning, this was what I was referring to:
According irfanview, is compressed as - LZW tif file of 300 DPI What Quan
says is correct image is heavily compressed tif one. Tesseract-OCR is
supported only uncompressed tif file only from my experience.
Sriranga(78yrsold)
Thanks for pointing it out.
Mike
Von: zdenko podobny [mailto:[email protected]]
Gesendet: Montag, 28. März 2011 14:34
An: Lutz, Michael
Cc: Dmitri Silaev; [email protected]; Richard Genthner
Betreff: Re: tesseract.exe has stopped working on win2008 r2
On Mon, Mar 28, 2011 at 11:54 AM, Lutz, Michael
<[email protected]<mailto:[email protected]>> wrote:
Hi All,
So the image Richard gave us is a compressed TIF file. Since tesseract only
supports uncompressed TIF images as noticed by Zdenko you will not get any
results from this image.
Incorrect:
1. image support is task of leptonica, so list of supported format can be
found of leptonica web and source code. I think we really need to distinguish
this, because with upgrading of leptonica there could be support for new format
without changing a line in tesseract code.
2. I guessed that leptonica has problem with tiff with "lzw compression".
When I created tiff with "zip compression" it worked (there are also other
compression algorithms available in tiff: Packbits, G4, G3,...). I never said
that leptonica (tesseract) support only uncompressed tiff. I am sorry if I was
not clear about this.
3. As TP corrected me: problem is not in LZW compression, but in "Samples per
Pixel". Leptonica support 1, 3, 4. Input image used (unsupported) 2. To "solve"
this just open input file in InfranView and save it as tiff with lzw
compression. It will change "Samples/Pixel" to 1 automatically ;-)
Zdenko
I attached the image as an uncompressed TIF file, see uncompressed.zip, this
image is processed by tesseract without any problems.
Also attached is a tesseract.zip, which should unpack a tesseract.executable,
just rename it to tesseract.exe if it went through, it is a release static
build using Win7 and WinSDK 7.1 if anyone still wants it.
Regards,
Mike
-----Ursprüngliche Nachricht-----
Von: Dmitri Silaev [mailto:[email protected]<mailto:[email protected]>]
Gesendet: Samstag, 26. März 2011 22:04
An: [email protected]<mailto:[email protected]>
Cc: zdenko podobny; Lutz, Michael; Richard Genthner
Betreff: Re: tesseract.exe has stopped working on win2008 r2
Guys, I still can't understand what the error is produced by
Tesseract. Let's wait for the error screenshot. Or did you understand
everything already? Richard says he's got an error message...
Warm regards,
Dmitri Silaev
On Sat, Mar 26, 2011 at 5:42 PM, zdenko podobny
<[email protected]<mailto:[email protected]>> wrote:
>
>
> On Fri, Mar 25, 2011 at 5:40 PM, Lutz, Michael
> <[email protected]<mailto:[email protected]>> wrote:
>>
>> Hi,
>>
>> I just ran your tif file, I get no results, it must have something to do
>> with the size of the image. If I try to run a portion of tiff something
>> smaller than 1000x1000 then I get results.
>>
>> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed?
>
> This is not tesseract but leptonica issue (library used for image handling).
> When I run it on linux I got error message comming from leptonica (1.67 -> I
> did not try 1.68 on linux yet):
> Error in pixReadFromTiffStream: spp not in set {1,3,4}
> Error in pixReadStreamTiff: pix not read
> Error in pixReadTiff: pix not read
> On Windows leptonica "release version" library did not show error/warning
> messages because of compile option "NO_CONSOLE_IO"
> (see http://code.google.com/p/leptonica/issues/detail?id=42).
> It looks like leptonica did not support lzw compression for tiff (
> see http://www.leptonica.com/source/README.html "9. Image I/O" - lzw is
> mentioned in png and gif section, but not with tif). I change
> tif compression from lzw to zip (BTW: this will cause smaller image),
> tesseract will produce ouput (on XP SP3).
> Zdenko
>
>> Mike
>>
>>
>>
>> Von: Richard Genthner
>> [mailto:[email protected]<mailto:[email protected]>]
>> Gesendet: Freitag, 25. März 2011 17:04
>> An: Lutz, Michael
>> Cc: [email protected]<mailto:[email protected]>
>>
>> Betreff: Re: tesseract.exe has stopped working on win2008 r2
>>
>>
>>
>> Here is the screenshot and the tif file. Dmitri if you rename the .exe
>> that should work. I'm trying to get the traning data up.
>>
>> ________________________________
>> This message is confidential and intended only for the addressee. If you
>> have received this message in error, please immediately notify the
>> [email protected]<mailto:[email protected]> and delete it from your system
>> as well as any copies. The
>> content of e-mails as well as traffic data may be monitored by NDS for
>> employment and security purposes.
>> To protect the environment please do not print this e-mail unless
>> necessary.
>>
>> An NDS Group Limited company. www.nds.com<http://www.nds.com>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to
>> [email protected]<mailto:[email protected]>.
>> To unsubscribe from this group, send email to
>> [email protected]<mailto:tesseract-ocr%[email protected]>.
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to
> [email protected]<mailto:[email protected]>.
> To unsubscribe from this group, send email to
> [email protected]<mailto:tesseract-ocr%[email protected]>.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
--- Begin Message ---
According irfanview, is compressed as - LZW tif file of 300 DPI What Quan
says is correct image is heavily compressed tif one. Tesseract-OCR is
supported only uncompressed tif file only from my experience.
On Sat, Mar 26, 2011 at 6:17 PM, Quan Nguyen
<[email protected]<mailto:[email protected]>> wrote:
The image appears to have been heavily compressed. OCR the whole image
did not yield anything. Doing it blockwise, I got some results but not
very accurate:
Ch Juhe 24, 2@@9 the ACHP vctect ct: revisect teccmmehdettcns tcr
mee_s1es-muhqes-t'ube[[e (NR/H~
‘evictetnce ct tmmuhity’ requtrementstcr heetthcete teefschheh‘. The
Heatthcate thtecttctn Ochtrct
Ptectices Aciviscry Ccmrmttee (HHCPAG) has ernctcfsed these changes.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to
[email protected]<mailto:[email protected]>.
To unsubscribe from this group, send email to
[email protected]<mailto:tesseract-ocr%[email protected]>.
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
--- End Message ---