Extract link annotations (hyperlinks) with tika app?

Svensson, Kristian Thu, 28 Feb 2019 01:56:43 -0800

Using the tika app (tika-app-1.20.jar), is it possible to extract link 
annotations (hyperlinks)? Ideally I would like to get a href in the xhtml 
output. I failed finding any documentation regarding this.


I found out that pdfbox can extract link annotations:
https://stackoverflow.com/questions/38587567/how-to-extract-hyperlink-information-pdfbox
But I'm not sure how to use this with the tika app.

I think the tika app is using pdfbox for pdf content extraction, but I might be 
wrong😊

Any help greatly appreciated!

Best Regards,

Kristian

________________________________


NOTICE: This communication and any attachments ("this message") may contain 
information which is privileged, confidential, proprietary or otherwise subject 
to restricted disclosure under applicable law. This message is for the sole use 
of the intended recipient(s). Any unauthorized use, disclosure, viewing, 
copying, alteration, dissemination or distribution of, or reliance on, this 
message is strictly prohibited. If you have received this message in error, or 
you are not an authorized or intended recipient, please notify the sender 
immediately by replying to this message, delete this message and all copies 
from your e-mail system and destroy any printed copies.



-LAEmHhHzdJzBlTWfa4Hgs7pbKl

Extract link annotations (hyperlinks) with tika app?

Reply via email to