I was wrong.  According to Adobe TN5116, it's the whole JFIF file including
APP0 headers, etc.

So, has anyone seen a problem like this?  I'm starting to suspect I have a
bad JVM/ImageIO; I'm going to try running my code on another system.


On Mon, Feb 13, 2012 at 8:57 AM, Jason Cwik <[email protected]> wrote:

> Hi All,
>
> I'm using pdfbox 1.6 to generate PDF files.  These text files contain some
> simple text and JPEG images.  The JPEGs are small (~157x200), representing
> thumbnails of other documents.
>
> The problem is, only about half of my images display.  The rest have a
> blank box where the image should be.  Also, if I run the viewer like
> pdfedit or evince from the command line, you see errors:
>
> jason@butters:~/Desktop$ evince msg4.pdf
> Error: Could not find start of jpeg data
> Error: Could not find start of jpeg data
> Error: Could not find start of jpeg data
> Error: Could not find start of jpeg data
> Error: Could not find start of jpeg data
>
>
> Looking at PDJpeg, it looks like it reads in my JPEG to a BufferedImage,
> and then recompresses it to the stream.  The problem is (I think), that if
> you look at the PDF spec it seems that the stream should really be just the
> raw DCT data.  However, when you look at the PDFs generated by PDFBox, I
> see the JPEG headers (e.g. 0xff, ... "JFIF") in the stream.  It seems like
> the PDF viewers are being lenient and trying to find the DCT data, but
> giving up on some of my images.
>
> Does this sound correct??
>
> Thanks,
> Jason
>
>



-- 
Jason Cwik
CTO
Connectic, Inc
Cell: 612-217-0442

Reply via email to