Tilman,
The code I tried is:
byte[] bytes = // content of file as a byte array
PDDocument pdDocument = PDDocument.load( bytes );
PDDocumentCatalog cat2 = pdDocument.getDocumentCatalog();
PDPageLabels pageLabels = cat2.getPageLabels();
if ( pageLabels == null ) {
System.out.println( "Page labels missing " );
}
I'm getting "Page labels missing" on each document.
I have no idea of, or control over the process used to convert a Word file
into a PDF. I just inherited a bunch of PDFs that I'm trying to interpret.
Dave Patterson
On Mon, May 15, 2017 at 1:57 PM, Tilman Hausherr <[email protected]>
wrote:
> Am 15.05.2017 um 19:11 schrieb David Patterson:
>
>> Alas, after testing with my documents, the PageLabels is null. :-(
>>
>
> But you said it has "TOC-1". This sounds like pagelabels. You can also try
> with PDFDebugger, it will show the labels if there are some.
>
> Tilman
>
>
>
>> Thank you for the help and encouragement.
>>
>> Dave Patterson
>>
>> On Mon, May 15, 2017 at 12:34 PM, Tilman Hausherr <[email protected]>
>> wrote:
>>
>> Am 15.05.2017 um 18:30 schrieb David Patterson:
>>>
>>> Tilman,
>>>>
>>>> Thank you very much. (I feel bad asking some of the questions, but the
>>>> data
>>>> is stored in "out of the way" corners that are hard to find.
>>>>
>>>> Don't :-)
>>>
>>>
>>> Is there any documentation that explains how the linkages work? Would it
>>>> help to have the PDF Standard Document?
>>>>
>>>>
>>> Yes. I read there all the time. The PDFBox API closely follows the PDF
>>> specification. So here it's linked from the document catalog, so the
>>> methods used are in the PDDocumentCatalog class. But asking was a good
>>> decision as this got you that convenience method (that is in
>>> PDFDebugger).
>>>
>>> Tilman
>>>
>>>
>>>
>>> Thanks.
>>>>
>>>> Dave Patterson
>>>>
>>>> On Mon, May 15, 2017 at 12:13 PM, Tilman Hausherr <
>>>> [email protected]>
>>>> wrote:
>>>>
>>>> Am 15.05.2017 um 15:20 schrieb David Patterson:
>>>>
>>>>> I've now got my code working to iterate through a PDDocument and
>>>>> process
>>>>>
>>>>>> it
>>>>>> page by page.
>>>>>>
>>>>>> Next hurdle: Is there a way to get the page number as printed? I've
>>>>>> got
>>>>>> page numbers like "TOC-1", "TOC-2", "Page 1", ...
>>>>>>
>>>>>> How much work is it to get the "TOC-1"?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Dave Patterson
>>>>>>
>>>>>>
>>>>>> /**
>>>>>>
>>>>> * Convenience method to get the page label if available.
>>>>> *
>>>>> * @param document
>>>>> * @param pageIndex 0-based page number.
>>>>> * @return a page label or null if not available.
>>>>> */
>>>>> public static String getPageLabel(PDDocument document, int
>>>>> pageIndex)
>>>>> {
>>>>> PDPageLabels pageLabels;
>>>>> try
>>>>> {
>>>>> pageLabels = document.getDocumentCatalog().
>>>>> getPageLabels();
>>>>> }
>>>>> catch (IOException ex)
>>>>> {
>>>>> return ex.getMessage();
>>>>> }
>>>>> if (pageLabels != null)
>>>>> {
>>>>> String[] labels = pageLabels.getLabelsByPageIndices();
>>>>> if (labels[pageIndex] != null)
>>>>> {
>>>>> return labels[pageIndex];
>>>>> }
>>>>> }
>>>>> return null;
>>>>> }
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>