I've been tasked with traversing a PDF files for embedded links, i.e. anchors added in winword and what have you. Due to different ways URLs can be embedded in PDFs, and the variety of nesting levels and ordering, it's proving a challenge.
I can see in my IDE the URL that I need to acquire is nested deep in the COSBase's basObject property, but baseObject is private, so I can't toString() it. To make matters worse, toString() on the COSObject itself returns me only the top level object rather than the entire contents--if there were a to get the stringified baseObject's contents, I would be done with this by now. My example method is a bit long for email so I put it on pastebin. It's compiles, but is borderline-pseudocode so there's no need to look at it unless the rest of this email is unclear. http://pastebin.com/LvXu0tNh I'm curious what to do after getting an object from an array as such: COSObject obj = (COSObject) cosArr.get(i); At this point I can see the obj's baseObject, like I said, but to acquire it I have to attempt all sorts of casts to COSDictionary, catch the ClassCastExceptions and try again, with nullchecks on retrieved objects at every step. Am I doing this wrong? I haven't been able to find a pattern for this that isn't ugly as sin. If anyone could point me to examples of how COSObject.getItem() is supposed to be used in different situations, that would be great. Finally, assuming that there really is no way to avoid all this casting, are there any casts with PDFBox that I can be assured will not result in a ClassCastException? Specifically, casts to COSArray and COSObject. Sorry for the haphazard nature of this email, I'm still trying to figure out exactly all the things about this that I don't understand.

