Hi,
First, sorry if this is not the correct way to post, I was unable to locate the
PDFBox forum
other than thru MarkMail.
Second, I know there are two post regarding this topic, I've read both of them.
Getting page number for bookmarks (BM-Thread)
How to get Page Number from a PDPage (PDP-Thread)
I'm using Adobe Acrobat 8 Pro.
I am using PDFBox on a Hibernate Search In Action PDF, but i think this problem
would be the same on other Book PDFs
When I open a pdf doc in acrobat there are two indication of which page I am
on,
[ pageN ] ( page# of total#), in acrobat these two number DON't always match.
In fact the [pageN] matches the one you would see on the printed page, while
(page# of total#) is an indication of the total number of Sheets of pager in
the book,
ie including all the pages before page 1, TOC, copyright ...
It seems that when I use the code in the BM-Thread, references to the first 21
Sheets
all indicate Page 1 and thereafter this format provides the [pageN] value i.e.
the number
that you would see on the printed page of a book.
With BM-Thread I see the following problems:
1. Bookmarks have PDAction = null
2. The book is divided into Parts/Chapters/SubChapters/... There are bookmarks
for each. However only Parts has PDAction, and each Part shows the same
Page (1) as in page = 1.
3. The Index bookmarks don't show the [page#], they show the sheet#
When I use the code from the PDP-Thread, I get the number which is basically
the
sheet count, i.e page# from the above (page# of total#) except that it doesn't
work
once you get to the Book Index.
It seems like it would have been easier if I could get PDPageDestination from
PDOutlineItem and then use pdpageDest.getPageNumber(), but the type coming
back from PDOutlineItem.getDestination is PDNamedDestination.
Basically, I just modified the PrintBookmarks.java to get the results:
while (current != null) {
dest = current.getDestination();
pdAction = current.getAction();
if (pdAction != null) {
// From BM-Thread
COSObject targetPageRef = (COSObject) ((COSArray) current
.getAction().getCOSDictionary()
.getDictionaryObject("D")).get(0);
String objStr = String.valueOf(targetPageRef.getObjectNumber()
.intValue());
String genStr = String.valueOf(targetPageRef
.getGenerationNumber().intValue());
szKey = objStr + "," + genStr;
pageNumber = (Integer) getPageMap().get(objStr + "," + genStr);
} else if (dest != null) {
// From PDP-Thread
PDPage pdp = current.findDestinationPage(document);
document.getDocumentCatalog().getPages();
List allpages = new ArrayList();
document.getDocumentCatalog().getPages().getAllKids(allpages);
pageNum = allpages.indexOf(pdp) + 1;
if (dest instanceof PDNamedDestination) {
szDest = ((PDNamedDestination)dest).getNamedDestination();
}
}
System.out.println(indentation + current.getTitle() + " ... page: "
+ pageNumber + " key: " + szKey + " dest: " + szDest
+ " pageNum: " + pageNum);
printBookmark(current, indentation + " ");
current = current.getNextSibling();
}
Sample output:
Hibernate Search ... page: 1 key: 30269,0 dest: null pageNum: null
contents ... page: 1 key: 30269,0 dest: G1.558638 pageNum: 8
preface ... page: 1 key: 30269,0 dest: G2.552675 pageNum: 16
acknowledgments ... page: 1 key: 30269,0 dest: G2.557271 pageNum: 18
about this book ... page: 1 key: 30269,0 dest: G2.557499 pageNum: 20
Part 1 Understanding Search Technology ... page: 1 key: 30269,0 dest:
G3.1005308 pageNum: 26
Chapter 1 State of the art ... page: null key: null dest: G3.998410
pageNum: 28
1.1 What is search? ... page: null key: null dest: G3.998485 pageNum: 29
...
Part 5 Native Lucene, scoring, and the wheel ... page: 1 key: 30269,0 dest:
G14.1040306 pageNum: 376
Chapter 12 Document ranking ... page: null key: null dest: G14.1023742
pageNum: 378
...
13.4 Summary ... page: null key: null dest: G15.1021300 pageNum: 465
appendix: Quick reference ... page: 1 key: 30269,0 dest: G16.998406 pageNum: 466
Hibernate Search mapping annotations ... page: null key: null dest:
G16.998426 pageNum: 466
Hibernate Search APIs ... page: null key: null dest: G16.999345 pageNum: 468
Lucene queries ... page: null key: null dest: G16.1001842 pageNum: 473
index ... page: 1 key: 30269,0 dest: G17.174043 pageNum: 476
Symbols ... page: 476 key: 3672,0 dest: null pageNum: null
Numerics ... page: 476 key: 3672,0 dest: null pageNum: null
A ... page: 476 key: 3672,0 dest: null pageNum: null
B ... page: 476 key: 3672,0 dest: null pageNum: null
...
Tim Reynolds
([email protected])