Re: PDFBOX 2 scanned documents

Tilman Hausherr Wed, 23 Sep 2015 10:31:47 -0700

The XObjects should be the same count in version 1 and 2.

If you don't want to share the PDFs, then look at them with the newPDFDebugger. You can see the XObject images easily.


Tilman

Am 23.09.2015 um 19:21 schrieb Tim Daley:

Here's the basic code that used to work. Granted, it probably dependsheavily on Version 1's structure.



PDPage pdPage = CFCAPDFInputProgressBar.this.pdPages.get(i);

Map<COSName, PDXObject> images = new TreeMap<COSName, PDXObject>();

PDResources pdResources = pdPage.getResources();

for(Entry<COSName, PDXObject> objectImageEntry:images.entrySet())

{

  PDXObject pdXObject = objectImageEntry.getValue();

  if (pdXObject instanceof PDImageXObject)

  {

    PDImageXObject pdXObjectImage= ((PDImageXObject)pdXObject);

    BufferedImage bufferedImage = null;

    try{bufferedImage= pdXObjectImage.getImage();}

catch(Throwable t)

    {

      t.printStackTrace();

      randomAccessFile.close();

      throw new RuntimeException(t);

    }

    if (CFCAPDFInputProgressBar.this.music.getLandscape())

      bufferedImage= rotate90DX(bufferedImage);

int width = bufferedImage.getWidth();

int height = bufferedImage.getHeight();

if (CFCAPDFInputProgressBar.this.music.getTwoPage())

    {

      width /= 2;

      boolean even = i%2 == 0;

      intrightPageNo= even?i+1:pageCount*2-i;

      intleftPageNo= even?pageCount*2-i:i+1;

      putPage(bufferedImage, rightPageNo, width, 0, width, height);

      putPage(bufferedImage, leftPageNo, 0, 0, width, height);

    }

else

    {

      int pageNo = CFCAPDFInputProgressBar.this.music.getStart() + i;

    putPage(bufferedImage, pageNo, 0, 0, width, height);

    }

  }

}

On Wed, Sep 23, 2015 at 1:06 PM, Tilman Hausherr<[email protected] <mailto:[email protected]>> wrote:


    Am 23.09.2015 um 17:33 schrieb Tim Daley:

        It appears that PDFBOX 2 handles scanned documents differently
        than PDFBOX
        1.

        I have multipage PDFs that I have scanned from a
        Konica/Minolta C224e. The
        PDFs in version 1 seemed to come in as a single image. Now in
        version 2,
        they seem to come in as multiple images. I assume this is to
        reduce the
        size of the resultant PDFs.

        Is there a way to retrieve each page as a single image or is
        there a method
        to merge all the images on a page into a single image?


    Can't comment without having a sample PDF. And I don't know what
    you mean with "seemed to come in as a single image".

    Tilman

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    <mailto:[email protected]>
    For additional commands, e-mail: [email protected]
    <mailto:[email protected]>




--
*Tim Daley*
IT Specialist-Operating Systems
cru | Engagement & Services | Platform Team
o:407-826-2911 | m:407-716-0284
[email protected] <mailto:[email protected]>




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: PDFBOX 2 scanned documents

Reply via email to