Looks like I'm all set.

I had to remove -flatten from the command above, and all is working now.

Thanks so much for the help.


On Sun, Feb 3, 2013 at 2:18 PM, Mike Lissner <[email protected]
> wrote:

> OK, we're getting somewhere!
>
> I figured out that the Ubuntu repo just doesn't work properly with tiffs,
> and recompiled and installed tesseract and leptonica.
>
> So now when I run tesseract -v, I get:
>
> ↪ tesseract -v
> tesseract 3.02.02
>  leptonica-1.69
>   libjpeg 8b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4
>
> Whereas previously, I didn't get anything mentioning libtiff.
>
> From there, I ran the convert command on the stackoverflow post:
>
> convert -depth 4 -density 300 -background white -flatten +matte
> united_states_v._ups_customhouse_brokerage_inc..pdf
> united_states_v._ups_customhouse_brokerage_inc2.tiff
>
> The resulting file worked well with tesseract, but it only had the last
> page of the PDF...so it's close -- very close -- but not quite there yet.
>
>
> On Sun, Feb 3, 2013 at 2:08 PM, zdenko podobny <[email protected]> wrote:
>
>> BTW: spp means Samples-per-pixel[1]. Are you able to instruct imagick to
>> use 1,3 or 4?
>> And I found report on stackoverflow[2] - there mentioned that imagick use
>> to set spp to 2, which should be invalid for tiff...
>>
>> [1] http://tpgit.github.com/Leptonica/tiffio_8c_source.html
>> [2]
>> http://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format
>>
>> Zdenko
>>
>>
>> On Sun, Feb 3, 2013 at 11:00 PM, zdenko podobny <[email protected]> wrote:
>>
>>> Are you able to generate just one page or small example? Or can you
>>> provide step how you create it (so I can create it)?
>>> Tiff could be tricky. E.g. libtiff-4 do not work for me...
>>>
>>> Zdenko
>>>
>>>
>>> On Sun, Feb 3, 2013 at 10:29 PM, Mike Lissner <
>>> [email protected]> wrote:
>>>
>>>> It's about 300MB, unfortunately, but I generate it programmatically
>>>> using imagemagick in a way that's worked in the past, so I don't think the
>>>> tiff file itself is the issue.
>>>>
>>>> If you're willing to download this monster, I'll post it to dropbox.
>>>> I'd love the help, but I don't think it's the right problem.
>>>>
>>>>
>>>> On Sun, Feb 3, 2013 at 1:16 PM, zdenko podobny <[email protected]>wrote:
>>>>
>>>>> Can you send and example of you tif file?
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version
>>>>>> 1.69.
>>>>>>
>>>>>> I've installed these, and also installed libtiff4 using apt-get.
>>>>>>
>>>>>> When I try to process a document, I get:
>>>>>>
>>>>>> ↪ sudo tesseract united_states_v._ups_customhouse_brokerage_inc.tif
>>>>>> united_states_v._ups_customhouse_brokerage_inc -l eng
>>>>>> Tesseract Open Source OCR Engine v3.02 with Leptonica
>>>>>> Error in pixReadFromTiffStream: spp not in set {1,3,4}
>>>>>> Error in pixReadStreamTiff: pix not read
>>>>>> Error in pixReadStream: tiff: no pix returned
>>>>>> Error in pixRead: pix not read
>>>>>> Unsupported image type.
>>>>>>
>>>>>>
>>>>>> Which seems baffling to me. I've tried reinstalling leptonica,
>>>>>> reininstalling the tiff libraries, and reinstalling tesseract in the hope
>>>>>> that they'd support tiffs once reinstalled. So far, nothing is helping.
>>>>>>
>>>>>> I was hoping that Ubuntu 12.04 would support everything i needed it
>>>>>> to without having to compile from source, but so far I've had bad luck. 
>>>>>> Is
>>>>>> there a way to make this work?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to [email protected]
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected]
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>>>
>>>>
>>>>  --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>>
>>>
>>>
>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to