I don't know any more about those other features but I'm very excited to get the new property flags into production. I think that will unlock a bunch of learning. Thank you for adding that.
On Tue, Feb 3, 2026 at 1:07 PM Tim Allison <[email protected]> wrote: > Thank you, Mike. > > If you know more about the other features I added, please carefully > review. I trusted Claude on those and to generate test files for dde etc. > > On Tue, Feb 3, 2026 at 7:57 AM Mike Flester via user <[email protected]> > wrote: > >> Thank you Tim! the PR looks great. >> >> On Mon, Feb 2, 2026 at 7:18 PM Tim Allison <[email protected]> wrote: >> >>> Thank you for raising this and sharing a triggering doc. I've opened: >>> https://issues.apache.org/jira/browse/TIKA-4646 >>> >>> On Mon, Feb 2, 2026 at 11:02 AM Mike Flester via user < >>> [email protected]> wrote: >>> >>>> Hello - >>>> >>>> The ooxml/docx began life as a phishing email attachment. The attacker >>>> hyperlink has been replaced with something benign. >>>> >>>> Tika did not extract the link because (I think) it's in "instructional >>>> text". The document appears to work fine (the victim is able to click the >>>> link). >>>> >>>> I have a bit of POI code (not production quality) that can dig this >>>> instructional text out. >>>> >>>> Link to both the docx and the java code - >>>> https://limewire.com/d/qtC1E#79Q8zip1SU >>>> >>>> $ javac -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor.java >>>> $ java -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor >>>> missing-link.docx >>>> >>>> Is this something that might see a place in Tika? As an option on the >>>> existing XWPFWordExtractorDecorator? Or as a new parser in that package? Or >>>> would I be best doing something outside of Tika for this cae? >>>> >>>> Thanks, >>>> Mike >>>> >>>
