Tilman,

The code I tried is:

byte[] bytes = // content of file as a byte array
PDDocument pdDocument = PDDocument.load( bytes );
PDDocumentCatalog cat2 = pdDocument.getDocumentCatalog();
PDPageLabels pageLabels = cat2.getPageLabels();
if ( pageLabels == null ) {
System.out.println( "Page labels missing " );
}


I'm getting "Page labels missing" on each document.

I have no idea of, or control over the process used to convert a Word file
into a PDF. I just inherited a bunch of PDFs that I'm trying to interpret.

Dave Patterson

On Mon, May 15, 2017 at 1:57 PM, Tilman Hausherr <[email protected]>
wrote:

> Am 15.05.2017 um 19:11 schrieb David Patterson:
>
>> Alas, after testing with my documents, the PageLabels is null. :-(
>>
>
> But you said it has "TOC-1". This sounds like pagelabels. You can also try
> with PDFDebugger, it will show the labels if there are some.
>
> Tilman
>
>
>
>> Thank you for the help and encouragement.
>>
>> Dave Patterson
>>
>> On Mon, May 15, 2017 at 12:34 PM, Tilman Hausherr <[email protected]>
>> wrote:
>>
>> Am 15.05.2017 um 18:30 schrieb David Patterson:
>>>
>>> Tilman,
>>>>
>>>> Thank you very much. (I feel bad asking some of the questions, but the
>>>> data
>>>> is stored in "out of the way" corners that are hard to find.
>>>>
>>>> Don't :-)
>>>
>>>
>>> Is there any documentation that explains how the linkages work? Would it
>>>> help to have the PDF Standard Document?
>>>>
>>>>
>>> Yes. I read there all the time. The PDFBox API closely follows the PDF
>>> specification. So here it's linked from the document catalog, so the
>>> methods used are in the PDDocumentCatalog class. But asking was a good
>>> decision as this got you that convenience method (that is in
>>> PDFDebugger).
>>>
>>> Tilman
>>>
>>>
>>>
>>> Thanks.
>>>>
>>>> Dave Patterson
>>>>
>>>> On Mon, May 15, 2017 at 12:13 PM, Tilman Hausherr <
>>>> [email protected]>
>>>> wrote:
>>>>
>>>> Am 15.05.2017 um 15:20 schrieb David Patterson:
>>>>
>>>>> I've now got my code working to iterate through a PDDocument and
>>>>> process
>>>>>
>>>>>> it
>>>>>> page by page.
>>>>>>
>>>>>> Next hurdle: Is there a way to get the page number as printed? I've
>>>>>> got
>>>>>> page numbers like "TOC-1", "TOC-2", "Page 1", ...
>>>>>>
>>>>>> How much work is it to get the "TOC-1"?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Dave Patterson
>>>>>>
>>>>>>
>>>>>>       /**
>>>>>>
>>>>>        * Convenience method to get the page label if available.
>>>>>        *
>>>>>        * @param document
>>>>>        * @param pageIndex 0-based page number.
>>>>>        * @return a page label or null if not available.
>>>>>        */
>>>>>       public static String getPageLabel(PDDocument document, int
>>>>> pageIndex)
>>>>>       {
>>>>>           PDPageLabels pageLabels;
>>>>>           try
>>>>>           {
>>>>>               pageLabels = document.getDocumentCatalog().
>>>>> getPageLabels();
>>>>>           }
>>>>>           catch (IOException ex)
>>>>>           {
>>>>>               return ex.getMessage();
>>>>>           }
>>>>>           if (pageLabels != null)
>>>>>           {
>>>>>               String[] labels = pageLabels.getLabelsByPageIndices();
>>>>>               if (labels[pageIndex] != null)
>>>>>               {
>>>>>                   return labels[pageIndex];
>>>>>               }
>>>>>           }
>>>>>           return null;
>>>>>       }
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to