On Monday, June 29, 2015 at 3:57:08 AM UTC-4, Jeff Breidenbach wrote: > > Not available currently, and pretty major effort required to make it > happen, > both in Leptonica and Tesseract's PDF output module. No plans to work > on this. For other formats we try hard to not re-encode during PDF > generation > whenever practical. >
There's a JBIG2 encoder here: https://github.com/agl/jbig2enc Since it uses Leptonica for some of its internal operations, adding it to Leptonica might be a little cyclical (or require some restructuring). While Jeff obviously has more experience than I, it seems like it should be a straightforward integration. Non-trivial to be sure, but certainly doable. The PDF output module already supports multiple encodings, including the 1-bit G4, so it *seems* like it should mainly be a matter of filtering/transforming the segments in the JBIG2 stream and creating stream for the global symbols, if needed. Jeff - are there particular troublespots that you foresee if someone were to tackle this? Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4811bad1-33ea-4f42-afa4-f7f31c5fdeb3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

