Thank you - that works. Kevin Day
*trumpet**p| *480.961.6003 x1002 *e| *ke...@trumpetinc.com *www.trumpetinc.com <http://trumpetinc.com/>* LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> On Mon, Sep 23, 2019 at 8:32 PM Tilman Hausherr <thaush...@t-online.de> wrote: > Am 23.09.2019 um 23:40 schrieb Kevin Day: > > PdfDebugger is working fine - so the issue must be with how I'm using the > > library, or how I'm extracting the globals stream... > > > > I checked the globals stream contents that I'm extracting and compared to > > the globals in PDFDebugger, and they are identical bytes. > > > > I also checked the image content stream, and it has identical bytes as > well. > > > > > > I even changed my code to be identical to yours: > > > > JBIG2ImageReader reader = (JBIG2ImageReader) > > ImageIO.getImageReadersByFormatName("JBIG2").next(); > > JBIG2Globals globals = > > reader.processGlobals(ImageIO.createImageInputStream(new > > ByteArrayInputStream(globalBytes))); > > reader.setGlobals(globals); > > reader.setInput(ImageIO.createImageInputStream(new > > ByteArrayInputStream(imageBytes))); > > return reader.read(0, reader.getDefaultReadParam()); > > > > and it still fails. > > > > But PDFDebugger works fine. > > > > > > So it would seem like the way that PDFBox invokes JBIG2ImageReader is not > > the above? Could that be right?? > > > That is true, we're using the reader in a plugin independent way, which > is shown in the source of JBIG2Filter.java: > > > InputStream encoded = the input stream of the main image (without the > globals) > > InputStream source = encoded; > > InputStream source = new SequenceInputStream(((COSStream) > globals).createInputStream(), encoded); > > ... > > ImageInputStream iis = ImageIO.createImageInputStream(source); > > reader.setInput(iis); > > image = reader.read(0, irp); > > > > Tilman > > > > > > - K > > > > > > Kevin Day > > > > *trumpet**p| *480.961.6003 x1002 > > *e| *ke...@trumpetinc.com > > *www.trumpetinc.com <http://trumpetinc.com/>* > > > > LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog > > <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> > > > > > > On Fri, Sep 20, 2019 at 9:28 PM Tilman Hausherr <thaush...@t-online.de> > > wrote: > > > >> I wonder if the PDF can be displayed with PDFDebugger. If no => bug. If > >> yes, then you should debug this to see what calls are done, and whether > >> you have the same data input. Your calls seem to be OK, they look > >> similar to those I did when I debugged something in the jbig2 reader > >> (link is before it went to Apache, don't open issues on github): > >> https://github.com/levigo/jbig2-imageio/issues/21 > >> > >> Tilman > >> > >> Am 20.09.2019 um 22:23 schrieb Kevin Day: > >>> I am trying to use JBIG2ImageReader to parse JBIG2 data from a PDF (the > >>> image stream and globals are being provided - we are not using PdfBox > to > >>> parse the PDF itself). Please let me know if I should be using a > >> different > >>> communication avenue for JBIG2 specific questions. > >>> > >>> > >>> Here's what I'm trying to do: > >>> > >>> JBIG2ImageReader jbig2Reader = new > JBIG2ImageReader(new > >>> JBIG2ImageReaderSpi()); > >>> > >>> byte[] globalBytes = //raw bytes from PDF > >>> DECODEPARAMS, JBIG2GLOBALS > >>> > >>> ImageInputStream globalsInputStream = new > >>> DefaultInputStreamFactory().getInputStream(new > >>> ByteArrayInputStream(globalBytes)); > >>> > >>> JBIG2Globals globals = > >>> jbig2Reader.processGlobals(globalsInputStream); > >>> jbig2Reader.setGlobals(globals); > >>> > >>> byte[] imageBytes = // raw JBIG2 image stream bytes > >> from > >>> PDF > >>> ImageInputStream imageInputStream = new > >>> DefaultInputStreamFactory().getInputStream(new > >>> ByteArrayInputStream(image.getImageAsBytes())); > >>> jbig2Reader.setInput(imageInputStream); > >>> > >>> return jbig2Reader.read(0); > >>> > >>> > >>> When I do this, I get a null pointer exception: > >>> > >>> Exception in thread "main" java.lang.RuntimeException: Can't > instantiate > >>> segment classException in thread "main" java.lang.RuntimeException: > Can't > >>> instantiate segment class at > >>> > >> > org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:420) > >>> at > org.apache.pdfbox.jbig2.JBIG2Page.createNormalPage(JBIG2Page.java:202) > >>> at org.apache.pdfbox.jbig2.JBIG2Page.createPage(JBIG2Page.java:168) at > >>> org.apache.pdfbox.jbig2.JBIG2Page.composePageBitmap(JBIG2Page.java:157) > >> at > >>> org.apache.pdfbox.jbig2.JBIG2Page.getBitmap(JBIG2Page.java:133) at > >>> > org.apache.pdfbox.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:249) > >> at > >>> javax.imageio.ImageReader.read(ImageReader.java:939) > >>> > >>> .... > >>> > >>> Caused by: java.lang.NullPointerException at > >>> > >> > org.apache.pdfbox.jbig2.segments.TextRegion.initSymbols(TextRegion.java:1010) > >>> at > >>> > >> > org.apache.pdfbox.jbig2.segments.TextRegion.getSymbols(TextRegion.java:273) > >>> at > >>> > >> > org.apache.pdfbox.jbig2.segments.TextRegion.parseHeader(TextRegion.java:154) > >>> at > org.apache.pdfbox.jbig2.segments.TextRegion.init(TextRegion.java:1128) > >>> at > >>> > >> > org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:413) > >>> ... 19 more > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> The SegmentHeader array in TextRegion looks like this: > >>> > >>> (org.apache.pdfbox.jbig2.SegmentHeader[]) [null, > >>> > >>> #SegmentNr: 377 > >>> SegmentType: 0 > >>> PageAssociation: 1 > >>> Referred-to segments: none > >>> ] > >>> > >>> > >>> > >>> Note that the first element is null. I'm not sure why this is (maybe > >> it's > >>> not a valid JBIG2 data stream??). This file opens and displays fine in > >> PDF > >>> viewers, so I'm assuming it must be something that I'm doing wrong. > >>> > >>> > >>> Any pointers? > >>> > >>> - K > >>> > >>> Kevin Day > >>> > >>> *trumpet**p| *480.961.6003 x1002 > >>> *e| *ke...@trumpetinc.com > >>> *www.trumpetinc.com <http://trumpetinc.com/>* > >>> > >>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog > >>> <http://trumpetinc.com/blog/>| Twitter < > https://twitter.com/trumpetinc> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >