[tesseract-ocr] Re: jbig2 encoding in PDF output file

Tom Morris Wed, 01 Jul 2015 11:40:47 -0700

On Monday, June 29, 2015 at 3:57:08 AM UTC-4, Jeff Breidenbach wrote:
>
> Not available currently, and pretty major effort required to make it 
> happen,
> both in Leptonica and Tesseract's PDF output module. No plans to work
> on this. For other formats we try hard to not re-encode during PDF 
> generation
> whenever practical.
>


There's a JBIG2 encoder here: https://github.com/agl/jbig2enc  Since it 
uses Leptonica for some of its internal operations, adding it to Leptonica 
might be a little cyclical (or require some restructuring).

While Jeff obviously has more experience than I, it seems like it should be 
a straightforward integration.  Non-trivial to be sure, but certainly 
doable.  The PDF output module already supports multiple encodings, 
including the 1-bit G4, so it *seems* like it should mainly be a matter of 
filtering/transforming the segments in the JBIG2 stream and creating stream 
for the global symbols, if needed.

Jeff - are there particular troublespots that you foresee if someone were 
to tackle this?

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4811bad1-33ea-4f42-afa4-f7f31c5fdeb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: jbig2 encoding in PDF output file

Reply via email to