Thank you, Mike. If you know more about the other features I added, please carefully review. I trusted Claude on those and to generate test files for dde etc.
On Tue, Feb 3, 2026 at 7:57 AM Mike Flester via user <[email protected]> wrote: > Thank you Tim! the PR looks great. > > On Mon, Feb 2, 2026 at 7:18 PM Tim Allison <[email protected]> wrote: > >> Thank you for raising this and sharing a triggering doc. I've opened: >> https://issues.apache.org/jira/browse/TIKA-4646 >> >> On Mon, Feb 2, 2026 at 11:02 AM Mike Flester via user < >> [email protected]> wrote: >> >>> Hello - >>> >>> The ooxml/docx began life as a phishing email attachment. The attacker >>> hyperlink has been replaced with something benign. >>> >>> Tika did not extract the link because (I think) it's in "instructional >>> text". The document appears to work fine (the victim is able to click the >>> link). >>> >>> I have a bit of POI code (not production quality) that can dig this >>> instructional text out. >>> >>> Link to both the docx and the java code - >>> https://limewire.com/d/qtC1E#79Q8zip1SU >>> >>> $ javac -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor.java >>> $ java -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor >>> missing-link.docx >>> >>> Is this something that might see a place in Tika? As an option on the >>> existing XWPFWordExtractorDecorator? Or as a new parser in that package? Or >>> would I be best doing something outside of Tika for this cae? >>> >>> Thanks, >>> Mike >>> >>
