Thank you Tim! the PR looks great.

On Mon, Feb 2, 2026 at 7:18 PM Tim Allison <[email protected]> wrote:

> Thank you for raising this and sharing a triggering doc. I've opened:
> https://issues.apache.org/jira/browse/TIKA-4646
>
> On Mon, Feb 2, 2026 at 11:02 AM Mike Flester via user <
> [email protected]> wrote:
>
>> Hello -
>>
>> The ooxml/docx began life as a phishing email attachment. The attacker
>> hyperlink has been replaced with something benign.
>>
>> Tika did not extract the link because (I think) it's in "instructional
>> text". The document appears to work fine (the victim is able to click the
>> link).
>>
>> I have a bit of POI code (not production quality) that can dig this
>> instructional text out.
>>
>> Link to both the docx and the java code -
>> https://limewire.com/d/qtC1E#79Q8zip1SU
>>
>> $ javac -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor.java
>> $ java -cp ~/tika/tika-app/target/lib/*:. POILinkExtractor
>> missing-link.docx
>>
>> Is this something that might see a place in Tika? As an option on the
>> existing XWPFWordExtractorDecorator? Or as a new parser in that package? Or
>> would I be best doing something outside of Tika for this cae?
>>
>> Thanks,
>> Mike
>>
>

Reply via email to