OK, we're getting somewhere!

I figured out that the Ubuntu repo just doesn't work properly with tiffs,
and recompiled and installed tesseract and leptonica.

So now when I run tesseract -v, I get:

↪ tesseract -v
tesseract 3.02.02
 leptonica-1.69
  libjpeg 8b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4

Whereas previously, I didn't get anything mentioning libtiff.

>From there, I ran the convert command on the stackoverflow post:

convert -depth 4 -density 300 -background white -flatten +matte
united_states_v._ups_customhouse_brokerage_inc..pdf
united_states_v._ups_customhouse_brokerage_inc2.tiff

The resulting file worked well with tesseract, but it only had the last
page of the PDF...so it's close -- very close -- but not quite there yet.


On Sun, Feb 3, 2013 at 2:08 PM, zdenko podobny <[email protected]> wrote:

> BTW: spp means Samples-per-pixel[1]. Are you able to instruct imagick to
> use 1,3 or 4?
> And I found report on stackoverflow[2] - there mentioned that imagick use
> to set spp to 2, which should be invalid for tiff...
>
> [1] http://tpgit.github.com/Leptonica/tiffio_8c_source.html
> [2]
> http://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format
>
> Zdenko
>
>
> On Sun, Feb 3, 2013 at 11:00 PM, zdenko podobny <[email protected]> wrote:
>
>> Are you able to generate just one page or small example? Or can you
>> provide step how you create it (so I can create it)?
>> Tiff could be tricky. E.g. libtiff-4 do not work for me...
>>
>> Zdenko
>>
>>
>> On Sun, Feb 3, 2013 at 10:29 PM, Mike Lissner <
>> [email protected]> wrote:
>>
>>> It's about 300MB, unfortunately, but I generate it programmatically
>>> using imagemagick in a way that's worked in the past, so I don't think the
>>> tiff file itself is the issue.
>>>
>>> If you're willing to download this monster, I'll post it to dropbox. I'd
>>> love the help, but I don't think it's the right problem.
>>>
>>>
>>> On Sun, Feb 3, 2013 at 1:16 PM, zdenko podobny <[email protected]> wrote:
>>>
>>>> Can you send and example of you tif file?
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner <
>>>> [email protected]> wrote:
>>>>
>>>>> I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version
>>>>> 1.69.
>>>>>
>>>>> I've installed these, and also installed libtiff4 using apt-get.
>>>>>
>>>>> When I try to process a document, I get:
>>>>>
>>>>> ↪ sudo tesseract united_states_v._ups_customhouse_brokerage_inc.tif
>>>>> united_states_v._ups_customhouse_brokerage_inc -l eng
>>>>> Tesseract Open Source OCR Engine v3.02 with Leptonica
>>>>> Error in pixReadFromTiffStream: spp not in set {1,3,4}
>>>>> Error in pixReadStreamTiff: pix not read
>>>>> Error in pixReadStream: tiff: no pix returned
>>>>> Error in pixRead: pix not read
>>>>> Unsupported image type.
>>>>>
>>>>>
>>>>> Which seems baffling to me. I've tried reinstalling leptonica,
>>>>> reininstalling the tiff libraries, and reinstalling tesseract in the hope
>>>>> that they'd support tiffs once reinstalled. So far, nothing is helping.
>>>>>
>>>>> I was hoping that Ubuntu 12.04 would support everything i needed it to
>>>>> without having to compile from source, but so far I've had bad luck. Is
>>>>> there a way to make this work?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike
>>>>>
>>>>> --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>>>
>>>>
>>>>  --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>>
>>>
>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to