I like the regex option, and I _think_ that the anchor at the beginning
(along with the lack of backtracking) shouldn't cause horrible performance
degradation.
On Tue, Jun 9, 2020 at 7:04 AM Nick Burch wrote:
> Hi All
>
> At the moment, to detect RFC822 emails, we try and check for a bunch of
>
Hi All
At the moment, to detect RFC822 emails, we try and check for a bunch of
common header lines right at the start. If not, we check for a few "could
be an unusual header, could be some text", followed by checking for common
headers in a larger area of text below.
For example, starts