Thank you Tim! the PR looks great. On Mon, Feb 2, 2026 at 7:18 PM Tim Allison <[email protected]> wrote:
> Thank you for raising this and sharing a triggering doc. I've opened: > https://issues.apache.org/jira/browse/TIKA-4646 > > On Mon, Feb 2, 2026 at 11:02 AM Mike Flester via user < > [email protected]> wrote: > >> Hello - >> >> The ooxml/docx began life as a phishing email attachment. The attacker >> hyperlink has been replaced with something benign. >> >> Tika did not extract the link because (I think) it's in "instructional >> text". The document appears to work fine (the victim is able to click the >> link). >> >> I have a bit of POI code (not production quality) that can dig this >> instructional text out. >> >> Link to both the docx and the java code - >> https://limewire.com/d/qtC1E#79Q8zip1SU >> >> $ javac -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor.java >> $ java -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor >> missing-link.docx >> >> Is this something that might see a place in Tika? As an option on the >> existing XWPFWordExtractorDecorator? Or as a new parser in that package? Or >> would I be best doing something outside of Tika for this cae? >> >> Thanks, >> Mike >> >
