On 31 July 2014 20:27, Tom <[email protected]> wrote:
> Dear Jim,
>

Tom, I want to say before anything else that I very much appreciate
this followup message. I'm glad that you took the time to rephrase
your position.

> thanks for your explanantion, I also studied to two codes (one part is in
> Leptonica, the other, more important in Tesseract). I think, forcing to use
> "FLATE" just before the image is rendered into the PDF page is the best
> solution, I kindly ask you to try my (short and easy) patch and to inspect
> the generated files, which also were smaller in my test cases.
>

For your use case, undoubtedly. For other use cases, I'm not quite
convinced. For many users of Tesseract, the input will be a full
colour scan of a page image, and the objective will be to have the
smallest file size. I'm quite sure that this use case is what lead to
the PDF feature -- I don't think it's a coincidence that the Tesseract
team are located in the Google Books building, and Google Books offers
such PDFs!

As you've identified in this message, it wasn't my intention to offer
a work around -- in fact, I think there may be an extra issue here,
that Tesseract is perhaps a little too willing to believe the colour
depth reported in the image. That requires some more investigation.

Quite aside from the issue at hand, I think it's worth telling you
that, in general, sending a patch that comments out code to an open
source project will (usually) result in automatic rejection. Remove
the code, or don't -- don't leave ugly commented code.

> Please let me know, if you want me to perform some test cases with B/W and
> also colored pages (text plus images and so), but if step can be skipped, I
> would be happy because I haven't that much time. On the other hand, I really
> want to have my patch pulled in, or an additional command line parameter
> like "--force-lossless-compression" for the "pdf" mode.
>

Having it as an option is, I think, the best for everyone, and all use
cases. Otherwise, the tests are unavoidable. I think this would be a
good option for users to have -- that's why I commented on the issue
-- so I'd be happy to add it (I should have time over the weekend).

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHh9-xvLvUYpaqG7AVNnX1TY_Hpt3MyCQaRGRMLxJk-u0HmtoQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to