(This was originally a response to a comment in bug 3131, but turned
into a proposal.)

> 5) I agree that full is not appropriate for "simple" tests, that's not
> what it's meant to be used for.  I'd actually like to get rid of
> "full" since nothing we have uses it, and if you're resorting to full
> there's likely better ways to search for what you want.

full seems like it is mostly used for truly raw rawbody tests, it seems
like including headers in our [whatever we call the truly raw rawbody
test] is extra work generally not needed.
 
> 6) I wouldn't mind adding in some form of ":all" functionality for
> body/rawbody which would basically do a join("",@array_of_paragraphs)
> && s/\s+/ /g before doing the test.  That way people who want to try
> it can, and if they blow their own system up with bad RE -- well,
> that's not our problem. ;)

I'm not really in favor of that transform.  I think it would be
confusing, not too helpful, and slow.  If you want to look at the whole
message, write an eval.

I think we could possibly reexamine our message breakdown:

 * header tests - well-structured, so I have no changes to propose
 * body - decoded and rendered text, works quite well the way it is
 * rawbody - confusing name, decoded text, not rendered
 * full - pristine headers+body

We have also had some tests that iterate over the pristine _body_ data,
but none right now.

Our "full" tests currently don't use the full data at all.

Given how we decode MIME data, I think this might make sense:

 (To prevent some possible nitpicking, when I say "text" below, I
 generally mean text|message or whatever we decide to do about the
 orthogonal embedded message and the Apple Mail issue.)

 * header - stays the same
 * body - decoded and rendered text (unchanged)
 * decoded - decoded text (by default, see below), not rendered (new
             type, similar to the old "rawbody")
 * raw - pristine body, no changes, (raw means raw, whee)
 * full - pristine message, headers plus body (mostly for checksum tests)

For each test, make the default form of the data be a reference to an
array like how body currently works.

Next, a set of modifiers for each:

  one common modifier for all 4 types:
    - a 'join' (or 'string' or whatever) modifier to return the entire
      data in a single string, performance-be-damned
  one modifier for "decoded":
    - a regex to select decoded versions of specific content-types, any
      possible content type: text, application, image, etc.  "decoded"
      would default to the same set of types that are ultimately
      rendered as body, of course

Note I'm not really addressing the syntax for the modifiers yet, I'm
just thinking about what we want, what could be useful, and looking for
the 99% solution.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to