Re: Mime type magic and repeated similar blocks - thoughts?

2020-06-09 Thread Tim Allison
I like the regex option, and I _think_ that the anchor at the beginning (along with the lack of backtracking) shouldn't cause horrible performance degradation. On Tue, Jun 9, 2020 at 7:04 AM Nick Burch wrote: > Hi All > > At the moment, to detect RFC822 emails, we try and check for a bunch of >

Mime type magic and repeated similar blocks - thoughts?

2020-06-09 Thread Nick Burch
Hi All At the moment, to detect RFC822 emails, we try and check for a bunch of common header lines right at the start. If not, we check for a few "could be an unusual header, could be some text", followed by checking for common headers in a larger area of text below. For example, starts