Looks like I'm all set. I had to remove -flatten from the command above, and all is working now.
Thanks so much for the help. On Sun, Feb 3, 2013 at 2:18 PM, Mike Lissner <[email protected] > wrote: > OK, we're getting somewhere! > > I figured out that the Ubuntu repo just doesn't work properly with tiffs, > and recompiled and installed tesseract and leptonica. > > So now when I run tesseract -v, I get: > > ↪ tesseract -v > tesseract 3.02.02 > leptonica-1.69 > libjpeg 8b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4 > > Whereas previously, I didn't get anything mentioning libtiff. > > From there, I ran the convert command on the stackoverflow post: > > convert -depth 4 -density 300 -background white -flatten +matte > united_states_v._ups_customhouse_brokerage_inc..pdf > united_states_v._ups_customhouse_brokerage_inc2.tiff > > The resulting file worked well with tesseract, but it only had the last > page of the PDF...so it's close -- very close -- but not quite there yet. > > > On Sun, Feb 3, 2013 at 2:08 PM, zdenko podobny <[email protected]> wrote: > >> BTW: spp means Samples-per-pixel[1]. Are you able to instruct imagick to >> use 1,3 or 4? >> And I found report on stackoverflow[2] - there mentioned that imagick use >> to set spp to 2, which should be invalid for tiff... >> >> [1] http://tpgit.github.com/Leptonica/tiffio_8c_source.html >> [2] >> http://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format >> >> Zdenko >> >> >> On Sun, Feb 3, 2013 at 11:00 PM, zdenko podobny <[email protected]> wrote: >> >>> Are you able to generate just one page or small example? Or can you >>> provide step how you create it (so I can create it)? >>> Tiff could be tricky. E.g. libtiff-4 do not work for me... >>> >>> Zdenko >>> >>> >>> On Sun, Feb 3, 2013 at 10:29 PM, Mike Lissner < >>> [email protected]> wrote: >>> >>>> It's about 300MB, unfortunately, but I generate it programmatically >>>> using imagemagick in a way that's worked in the past, so I don't think the >>>> tiff file itself is the issue. >>>> >>>> If you're willing to download this monster, I'll post it to dropbox. >>>> I'd love the help, but I don't think it's the right problem. >>>> >>>> >>>> On Sun, Feb 3, 2013 at 1:16 PM, zdenko podobny <[email protected]>wrote: >>>> >>>>> Can you send and example of you tif file? >>>>> >>>>> Zdenko >>>>> >>>>> >>>>> On Sun, Feb 3, 2013 at 10:08 PM, Michael Lissner < >>>>> [email protected]> wrote: >>>>> >>>>>> I have Ubuntu 12.04, which has tesseract 3.02 and leptonica version >>>>>> 1.69. >>>>>> >>>>>> I've installed these, and also installed libtiff4 using apt-get. >>>>>> >>>>>> When I try to process a document, I get: >>>>>> >>>>>> ↪ sudo tesseract united_states_v._ups_customhouse_brokerage_inc.tif >>>>>> united_states_v._ups_customhouse_brokerage_inc -l eng >>>>>> Tesseract Open Source OCR Engine v3.02 with Leptonica >>>>>> Error in pixReadFromTiffStream: spp not in set {1,3,4} >>>>>> Error in pixReadStreamTiff: pix not read >>>>>> Error in pixReadStream: tiff: no pix returned >>>>>> Error in pixRead: pix not read >>>>>> Unsupported image type. >>>>>> >>>>>> >>>>>> Which seems baffling to me. I've tried reinstalling leptonica, >>>>>> reininstalling the tiff libraries, and reinstalling tesseract in the hope >>>>>> that they'd support tiffs once reinstalled. So far, nothing is helping. >>>>>> >>>>>> I was hoping that Ubuntu 12.04 would support everything i needed it >>>>>> to without having to compile from source, but so far I've had bad luck. >>>>>> Is >>>>>> there a way to make this work? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mike >>>>>> >>>>>> -- >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to [email protected] >>>>>> To unsubscribe from this group, send email to >>>>>> [email protected] >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> To unsubscribe from this group, send email to >>>>> [email protected] >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> >>>>> >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>>> >>>> >>> >>> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

