Thank you - that works.

Kevin Day

*trumpet**p| *480.961.6003 x1002
*e| *ke...@trumpetinc.com
*www.trumpetinc.com <http://trumpetinc.com/>*

LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
<http://trumpetinc.com/blog/>| Twitter  <https://twitter.com/trumpetinc>


On Mon, Sep 23, 2019 at 8:32 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

> Am 23.09.2019 um 23:40 schrieb Kevin Day:
> > PdfDebugger is working fine - so the issue must be with how I'm using the
> > library, or how I'm extracting the globals stream...
> >
> > I checked the globals stream contents that I'm extracting and compared to
> > the globals in PDFDebugger, and they are identical bytes.
> >
> > I also checked the image content stream, and it has identical bytes as
> well.
> >
> >
> > I even changed my code to be identical to yours:
> >
> >                  JBIG2ImageReader reader = (JBIG2ImageReader)
> > ImageIO.getImageReadersByFormatName("JBIG2").next();
> >                  JBIG2Globals globals =
> > reader.processGlobals(ImageIO.createImageInputStream(new
> > ByteArrayInputStream(globalBytes)));
> >                  reader.setGlobals(globals);
> >                  reader.setInput(ImageIO.createImageInputStream(new
> > ByteArrayInputStream(imageBytes)));
> >                  return reader.read(0, reader.getDefaultReadParam());
> >
> > and it still fails.
> >
> > But PDFDebugger works fine.
> >
> >
> > So it would seem like the way that PDFBox invokes JBIG2ImageReader is not
> > the above?  Could that be right??
>
>
> That is true, we're using the reader in a plugin independent way, which
> is shown in the source of JBIG2Filter.java:
>
>
> InputStream encoded = the input stream of the main image (without the
> globals)
>
> InputStream source = encoded;
>
> InputStream source = new SequenceInputStream(((COSStream)
> globals).createInputStream(), encoded);
>
> ...
>
> ImageInputStream iis = ImageIO.createImageInputStream(source);
>
> reader.setInput(iis);
>
> image = reader.read(0, irp);
>
>
>
> Tilman
>
>
> >
> > - K
> >
> >
> > Kevin Day
> >
> > *trumpet**p| *480.961.6003 x1002
> > *e| *ke...@trumpetinc.com
> > *www.trumpetinc.com <http://trumpetinc.com/>*
> >
> > LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
> > <http://trumpetinc.com/blog/>| Twitter  <https://twitter.com/trumpetinc>
> >
> >
> > On Fri, Sep 20, 2019 at 9:28 PM Tilman Hausherr <thaush...@t-online.de>
> > wrote:
> >
> >> I wonder if the PDF can be displayed with PDFDebugger. If no => bug. If
> >> yes, then you should debug this to see what calls are done, and whether
> >> you have the same data input. Your calls seem to be OK, they look
> >> similar to those I did when I debugged something in the jbig2 reader
> >> (link is before it went to Apache, don't open issues on github):
> >> https://github.com/levigo/jbig2-imageio/issues/21
> >>
> >> Tilman
> >>
> >> Am 20.09.2019 um 22:23 schrieb Kevin Day:
> >>> I am trying to use JBIG2ImageReader to parse JBIG2 data from a PDF (the
> >>> image stream and globals are being provided - we are not using PdfBox
> to
> >>> parse the PDF itself).  Please let me know if I should be using a
> >> different
> >>> communication avenue for JBIG2 specific questions.
> >>>
> >>>
> >>> Here's what I'm trying to do:
> >>>
> >>>                  JBIG2ImageReader jbig2Reader = new
> JBIG2ImageReader(new
> >>> JBIG2ImageReaderSpi());
> >>>
> >>>                           byte[] globalBytes = //raw bytes from PDF
> >>> DECODEPARAMS, JBIG2GLOBALS
> >>>
> >>>                           ImageInputStream globalsInputStream = new
> >>> DefaultInputStreamFactory().getInputStream(new
> >>> ByteArrayInputStream(globalBytes));
> >>>
> >>>                           JBIG2Globals globals =
> >>> jbig2Reader.processGlobals(globalsInputStream);
> >>>                           jbig2Reader.setGlobals(globals);
> >>>
> >>>                    byte[] imageBytes = // raw JBIG2 image stream bytes
> >> from
> >>> PDF
> >>>                   ImageInputStream imageInputStream = new
> >>> DefaultInputStreamFactory().getInputStream(new
> >>> ByteArrayInputStream(image.getImageAsBytes()));
> >>>                   jbig2Reader.setInput(imageInputStream);
> >>>
> >>>                   return jbig2Reader.read(0);
> >>>
> >>>
> >>> When I do this, I get a null pointer exception:
> >>>
> >>> Exception in thread "main" java.lang.RuntimeException: Can't
> instantiate
> >>> segment classException in thread "main" java.lang.RuntimeException:
> Can't
> >>> instantiate segment class at
> >>>
> >>
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:420)
> >>> at
> org.apache.pdfbox.jbig2.JBIG2Page.createNormalPage(JBIG2Page.java:202)
> >>> at org.apache.pdfbox.jbig2.JBIG2Page.createPage(JBIG2Page.java:168) at
> >>> org.apache.pdfbox.jbig2.JBIG2Page.composePageBitmap(JBIG2Page.java:157)
> >> at
> >>> org.apache.pdfbox.jbig2.JBIG2Page.getBitmap(JBIG2Page.java:133) at
> >>>
> org.apache.pdfbox.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:249)
> >> at
> >>> javax.imageio.ImageReader.read(ImageReader.java:939)
> >>>
> >>> ....
> >>>
> >>> Caused by: java.lang.NullPointerException at
> >>>
> >>
> org.apache.pdfbox.jbig2.segments.TextRegion.initSymbols(TextRegion.java:1010)
> >>> at
> >>>
> >>
> org.apache.pdfbox.jbig2.segments.TextRegion.getSymbols(TextRegion.java:273)
> >>> at
> >>>
> >>
> org.apache.pdfbox.jbig2.segments.TextRegion.parseHeader(TextRegion.java:154)
> >>> at
> org.apache.pdfbox.jbig2.segments.TextRegion.init(TextRegion.java:1128)
> >>> at
> >>>
> >>
> org.apache.pdfbox.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:413)
> >>> ... 19 more
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> The SegmentHeader array in TextRegion looks like this:
> >>>
> >>>    (org.apache.pdfbox.jbig2.SegmentHeader[]) [null,
> >>>
> >>> #SegmentNr: 377
> >>> SegmentType: 0
> >>> PageAssociation: 1
> >>> Referred-to segments: none
> >>> ]
> >>>
> >>>
> >>>
> >>> Note that the first element is null.  I'm not sure why this is (maybe
> >> it's
> >>> not a valid JBIG2 data stream??).  This file opens and displays fine in
> >> PDF
> >>> viewers, so I'm assuming it must be something that I'm doing wrong.
> >>>
> >>>
> >>> Any pointers?
> >>>
> >>> - K
> >>>
> >>> Kevin Day
> >>>
> >>> *trumpet**p| *480.961.6003 x1002
> >>> *e| *ke...@trumpetinc.com
> >>> *www.trumpetinc.com <http://trumpetinc.com/>*
> >>>
> >>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
> >>> <http://trumpetinc.com/blog/>| Twitter  <
> https://twitter.com/trumpetinc>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to