Tilman, Thanks. That was what I had come to realize when the PageLabels were null.
Just out of curiosity, how do page labels get created? Dave Patterson On Tue, May 16, 2017 at 9:26 AM, Tilman Hausherr <[email protected]> wrote: > Sadly for you, that one has nothing to do with page labels. It's really > just a footer on the page. And there is no concept of "footer" in PDF. It's > just text at the bottom. > > Tilman > > > Am 16.05.2017 um 15:21 schrieb David Patterson: > >> They show up when I print the PDF or open it to read it. I want to extract >> the Table of Contents from each of > 100 PDFs so I can make a super-Table >> of Contents and allow users to search for the document they need to read. >> (The file name of the desired contents is not obvious, and so with a >> consolidated Table of Contents, a more novice user can find the content >> they want to read and open the correct document to see the text. These are >> Standard Operating Procedures for a 24x7 production facility and the >> operators might need to review what to do in case of a problem. >> >> I was hoping that in the transition from Word (where the documents are >> authored, the saving as a PDF and combining them into Portfolios some part >> of the process would have identified it as a page label, but I guess that >> did not happen. >> >> I'm able to find the text of that string since it only occurs in the >> footer >> of the page. >> >> Thanks. >> >> Dave Patterson >> >> On Tue, May 16, 2017 at 8:42 AM, Tilman Hausherr <[email protected]> >> wrote: >> >> Am 16.05.2017 um 14:35 schrieb David Patterson: >>> >>> Tilman, >>>> >>>> The code I tried is: >>>> >>>> byte[] bytes = // content of file as a byte array >>>> PDDocument pdDocument = PDDocument.load( bytes ); >>>> PDDocumentCatalog cat2 = pdDocument.getDocumentCatalog(); >>>> PDPageLabels pageLabels = cat2.getPageLabels(); >>>> if ( pageLabels == null ) { >>>> System.out.println( "Page labels missing " ); >>>> } >>>> >>>> >>>> I'm getting "Page labels missing" on each document. >>>> >>>> Then lets go back to the beginning. You mentioned "I've got page numbers >>> like "TOC-1", "TOC-2", "Page 1"". Where did these show up? >>> >>> Tilman >>> >>> >>> >>> >>> I have no idea of, or control over the process used to convert a Word >>>> file >>>> into a PDF. I just inherited a bunch of PDFs that I'm trying to >>>> interpret. >>>> >>>> Dave Patterson >>>> >>>> On Mon, May 15, 2017 at 1:57 PM, Tilman Hausherr <[email protected] >>>> > >>>> wrote: >>>> >>>> Am 15.05.2017 um 19:11 schrieb David Patterson: >>>> >>>>> Alas, after testing with my documents, the PageLabels is null. :-( >>>>> >>>>>> But you said it has "TOC-1". This sounds like pagelabels. You can also >>>>>> >>>>> try >>>>> with PDFDebugger, it will show the labels if there are some. >>>>> >>>>> Tilman >>>>> >>>>> >>>>> >>>>> Thank you for the help and encouragement. >>>>> >>>>>> Dave Patterson >>>>>> >>>>>> On Mon, May 15, 2017 at 12:34 PM, Tilman Hausherr < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>> Am 15.05.2017 um 18:30 schrieb David Patterson: >>>>>> >>>>>> Tilman, >>>>>>> >>>>>>> Thank you very much. (I feel bad asking some of the questions, but >>>>>>>> the >>>>>>>> data >>>>>>>> is stored in "out of the way" corners that are hard to find. >>>>>>>> >>>>>>>> Don't :-) >>>>>>>> >>>>>>>> Is there any documentation that explains how the linkages work? >>>>>>> Would >>>>>>> it >>>>>>> >>>>>>> help to have the PDF Standard Document? >>>>>>>> >>>>>>>> >>>>>>>> Yes. I read there all the time. The PDFBox API closely follows the >>>>>>>> PDF >>>>>>>> >>>>>>> specification. So here it's linked from the document catalog, so the >>>>>>> methods used are in the PDDocumentCatalog class. But asking was a >>>>>>> good >>>>>>> decision as this got you that convenience method (that is in >>>>>>> PDFDebugger). >>>>>>> >>>>>>> Tilman >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Dave Patterson >>>>>>>> >>>>>>>> On Mon, May 15, 2017 at 12:13 PM, Tilman Hausherr < >>>>>>>> [email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Am 15.05.2017 um 15:20 schrieb David Patterson: >>>>>>>> >>>>>>>> I've now got my code working to iterate through a PDDocument and >>>>>>>> >>>>>>>>> process >>>>>>>>> >>>>>>>>> it >>>>>>>>> >>>>>>>>>> page by page. >>>>>>>>>> >>>>>>>>>> Next hurdle: Is there a way to get the page number as printed? >>>>>>>>>> I've >>>>>>>>>> got >>>>>>>>>> page numbers like "TOC-1", "TOC-2", "Page 1", ... >>>>>>>>>> >>>>>>>>>> How much work is it to get the "TOC-1"? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Dave Patterson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> /** >>>>>>>>>> >>>>>>>>>> * Convenience method to get the page label if available. >>>>>>>>>> >>>>>>>>> * >>>>>>>>> * @param document >>>>>>>>> * @param pageIndex 0-based page number. >>>>>>>>> * @return a page label or null if not available. >>>>>>>>> */ >>>>>>>>> public static String getPageLabel(PDDocument document, int >>>>>>>>> pageIndex) >>>>>>>>> { >>>>>>>>> PDPageLabels pageLabels; >>>>>>>>> try >>>>>>>>> { >>>>>>>>> pageLabels = document.getDocumentCatalog(). >>>>>>>>> getPageLabels(); >>>>>>>>> } >>>>>>>>> catch (IOException ex) >>>>>>>>> { >>>>>>>>> return ex.getMessage(); >>>>>>>>> } >>>>>>>>> if (pageLabels != null) >>>>>>>>> { >>>>>>>>> String[] labels = pageLabels.getLabelsByPageIndi >>>>>>>>> ces(); >>>>>>>>> if (labels[pageIndex] != null) >>>>>>>>> { >>>>>>>>> return labels[pageIndex]; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> return null; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> --------- >>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> --------- >>>>>>>>> >>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>> >>>>>>> For additional commands, e-mail: [email protected] >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> >>>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >

