On 31 July 2014 20:27, Tom <[email protected]> wrote: > Dear Jim, >
Tom, I want to say before anything else that I very much appreciate this followup message. I'm glad that you took the time to rephrase your position. > thanks for your explanantion, I also studied to two codes (one part is in > Leptonica, the other, more important in Tesseract). I think, forcing to use > "FLATE" just before the image is rendered into the PDF page is the best > solution, I kindly ask you to try my (short and easy) patch and to inspect > the generated files, which also were smaller in my test cases. > For your use case, undoubtedly. For other use cases, I'm not quite convinced. For many users of Tesseract, the input will be a full colour scan of a page image, and the objective will be to have the smallest file size. I'm quite sure that this use case is what lead to the PDF feature -- I don't think it's a coincidence that the Tesseract team are located in the Google Books building, and Google Books offers such PDFs! As you've identified in this message, it wasn't my intention to offer a work around -- in fact, I think there may be an extra issue here, that Tesseract is perhaps a little too willing to believe the colour depth reported in the image. That requires some more investigation. Quite aside from the issue at hand, I think it's worth telling you that, in general, sending a patch that comments out code to an open source project will (usually) result in automatic rejection. Remove the code, or don't -- don't leave ugly commented code. > Please let me know, if you want me to perform some test cases with B/W and > also colored pages (text plus images and so), but if step can be skipped, I > would be happy because I haven't that much time. On the other hand, I really > want to have my patch pulled in, or an additional command line parameter > like "--force-lossless-compression" for the "pdf" mode. > Having it as an option is, I think, the best for everyone, and all use cases. Otherwise, the tests are unavoidable. I think this would be a good option for users to have -- that's why I commented on the issue -- so I'd be happy to add it (I should have time over the weekend). -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAHh9-xvLvUYpaqG7AVNnX1TY_Hpt3MyCQaRGRMLxJk-u0HmtoQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

