Hello,
I noticed that depending on the file formats we have different behaviors. It
looks like it works quite well with PDF but we have issues with DOCX (Microsoft
Word OOXML documents). Tika doesn't extract any link from DOCX. Is it a known
issue?
Thanks a lot.
Sent with ProtonMail Secure Em
://tika.apache.org/1.19/api/org/apache/tika/sax/LinkContentHandler.html
-Original message-
> From:Nick Burch
> Sent: Thursday 12th November 2020 12:55
> To: nensick
> Cc: user@tika.apache.org
> Subject: Re: Extract URLs from a document
>
> On Wed, 11 Nov 2020, nensick wrote:
&
On Wed, 11 Nov 2020, nensick wrote:
I am exploring the available features and I managed also to extract
Office macros but I still don't find a way to get the links.
Imagine to have a PDF, a DOCX in which you have a "click here" text as a link
pointing
to a website (let's say example[.]com). Ho
Hello,
I am a new Tika user and it is amazing, so thanks for your effort.
I am exploring the available features and I managed also to extract Office
macros but
I still don't find a way to get the links.
Imagine to have a PDF, a DOCX in which you have a "click here" text as a link
pointing
to a w