Thank you, Mike.

If you know more about the other features I added, please carefully review.
I trusted Claude on those and to generate test files for dde etc.

On Tue, Feb 3, 2026 at 7:57 AM Mike Flester via user <[email protected]>
wrote:

> Thank you Tim! the PR looks great.
>
> On Mon, Feb 2, 2026 at 7:18 PM Tim Allison <[email protected]> wrote:
>
>> Thank you for raising this and sharing a triggering doc. I've opened:
>> https://issues.apache.org/jira/browse/TIKA-4646
>>
>> On Mon, Feb 2, 2026 at 11:02 AM Mike Flester via user <
>> [email protected]> wrote:
>>
>>> Hello -
>>>
>>> The ooxml/docx began life as a phishing email attachment. The attacker
>>> hyperlink has been replaced with something benign.
>>>
>>> Tika did not extract the link because (I think) it's in "instructional
>>> text". The document appears to work fine (the victim is able to click the
>>> link).
>>>
>>> I have a bit of POI code (not production quality) that can dig this
>>> instructional text out.
>>>
>>> Link to both the docx and the java code -
>>> https://limewire.com/d/qtC1E#79Q8zip1SU
>>>
>>> $ javac -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor.java
>>> $ java -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor
>>> missing-link.docx
>>>
>>> Is this something that might see a place in Tika? As an option on the
>>> existing XWPFWordExtractorDecorator? Or as a new parser in that package? Or
>>> would I be best doing something outside of Tika for this cae?
>>>
>>> Thanks,
>>> Mike
>>>
>>

Reply via email to