Will do - of course the implementation is down to you guys to do what you think is most sensible without breaking others.
The current detector just looks for {\rtf If it just made the f optional or did not look for it, then I am pretty certain that it would break nothing, but I would be happy with an artificial mime-type too. I have worked around it for now, so I can wait for the next release cycle. I will add an rtf that does not contain malware, for sure. In fact all you need do is use vi to delete the f1 part of any normal rtf magic and you have your test. I will attach it though 😊 Cheers, Jim > -----Original Message----- > From: Nick Burch [mailto:apa...@gagravarr.org] > Sent: Thursday, March 1, 2018 21:14 > To: user@tika.apache.org > Subject: Re: Malware RTF is not detected as RTF > > On Thu, 1 Mar 2018, Jim Idle wrote: > > Malicious RTF files take advantage of the fact that Microsoft do not > > follow their own RTF spec. Specifically, Word et al only looks for the > > opening sequence: > > > > {rt > > > > Thought the spec says it should be: > > > > {rtf1 > > I don't think that Tika can assume that all RTF users are as broken as Word > is! > > I'd be tempted to define a new mimetype of application/x-broken-rtf or > similar, and feed that a lower priority magic for {\rt, with a suitable > comment/explanation. That way, we won't tell people something is an RTF > which isn't, but we can help them spot these problematic files > > If you could create a small, broken but non-malicious rtf file, then raise an > enhancement jira + attach, that'd be great! > > Nick