Re: More questions about page iteration

Tilman Hausherr Tue, 16 May 2017 05:41:59 -0700

Am 16.05.2017 um 14:35 schrieb David Patterson:

Tilman,


The code I tried is:

byte[] bytes = // content of file as a byte array
PDDocument pdDocument = PDDocument.load( bytes );
PDDocumentCatalog cat2 = pdDocument.getDocumentCatalog();
PDPageLabels pageLabels = cat2.getPageLabels();
if ( pageLabels == null ) {
System.out.println( "Page labels missing " );
}


I'm getting "Page labels missing" on each document.

Then lets go back to the beginning. You mentioned "I've got page numberslike "TOC-1", "TOC-2", "Page 1"". Where did these show up?


Tilman


I have no idea of, or control over the process used to convert a Word file
into a PDF. I just inherited a bunch of PDFs that I'm trying to interpret.

Dave Patterson

On Mon, May 15, 2017 at 1:57 PM, Tilman Hausherr <[email protected]>
wrote:

Am 15.05.2017 um 19:11 schrieb David Patterson:

Alas, after testing with my documents, the PageLabels is null. :-(

But you said it has "TOC-1". This sounds like pagelabels. You can also try
with PDFDebugger, it will show the labels if there are some.

Tilman

Thank you for the help and encouragement.

Dave Patterson

On Mon, May 15, 2017 at 12:34 PM, Tilman Hausherr <[email protected]>
wrote:

Am 15.05.2017 um 18:30 schrieb David Patterson:

Tilman,

Thank you very much. (I feel bad asking some of the questions, but the
data
is stored in "out of the way" corners that are hard to find.

Don't :-)


Is there any documentation that explains how the linkages work? Would it

help to have the PDF Standard Document?

Yes. I read there all the time. The PDFBox API closely follows the PDF
specification. So here it's linked from the document catalog, so the
methods used are in the PDDocumentCatalog class. But asking was a good
decision as this got you that convenience method (that is in
PDFDebugger).

Tilman



Thanks.

Dave Patterson

On Mon, May 15, 2017 at 12:13 PM, Tilman Hausherr <
[email protected]>
wrote:

Am 15.05.2017 um 15:20 schrieb David Patterson:

I've now got my code working to iterate through a PDDocument and
process

it
page by page.

Next hurdle: Is there a way to get the page number as printed? I've
got
page numbers like "TOC-1", "TOC-2", "Page 1", ...

How much work is it to get the "TOC-1"?

Thanks.

Dave Patterson


       /**

        * Convenience method to get the page label if available.
        *
        * @param document
        * @param pageIndex 0-based page number.
        * @return a page label or null if not available.
        */
       public static String getPageLabel(PDDocument document, int
pageIndex)
       {
           PDPageLabels pageLabels;
           try
           {
               pageLabels = document.getDocumentCatalog().
getPageLabels();
           }
           catch (IOException ex)
           {
               return ex.getMessage();
           }
           if (pageLabels != null)
           {
               String[] labels = pageLabels.getLabelsByPageIndices();
               if (labels[pageIndex] != null)
               {
                   return labels[pageIndex];
               }
           }
           return null;
       }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: More questions about page iteration

Reply via email to