Re: [Haskell-cafe] Status of MIME Strike Force

2007-06-28 Thread Arie Peterson
Jeremy Shaw wrote:

 What is the status of the MIME Strike Force?

 Currently it is on hold while I work on some other higher priority
 projects. But, I do hope to get back to it soon. (Or, perhaps someone
 else will have time to work on it).

OK. Good to hear it is still alive, if slumbering.


I know nothing about MIME, so I will make some comments ;-).

 One way to express a filter that modifies an existing message is the
 following:

 exampleHeaders =
( (setHeader (Subject whee)) .
  (setHeader (Subject bork)).
  (addHeader (Keywords [baz, bar, bam])) .
  (addHeader (Keywords [zip, zap, zop]))
)

 where setHeader ensures that a header only appears once, and addHeader
 appends the header, leaving existing instances alone. The type system
 ensures that you can never call addHeader on (Subject
 whee). Unfortunately, that code seems a bit verbose.

If you want to signify that some headers are added and others replaced -
which seems a good idea - then it's not so bad, is it? Perhaps replacing
'setHeader' by 'set', and removing some parentheses, it's pretty minimal:

  modifyHeaders = set (Subject whee) . add (Keywords [quux,blub])

Anyway, I wouldn't bother too much about the exact syntax, at this stage.


 A whole other area I have not dealt with yet is data-mining and
 filters that depend on the values of existing fields. For example:

  1. find all the headers that contain the string XXX
  2. find all the Keywords fields and merge them into a single Keywords
 field

Right. Another one:

  3. Examine the contents of a certain header, and use the result to
modify other headers. (Think of a spam filter, for instance.)

I'm not sure what types of transformation we need to support. (I
personally only need 'pure' parsing and composing, not this 'on-the-fly'
transforming, but it is clearly necessary for some applications.)

There are many similar problems. Suppose you need to change a CSS file, by
changing RGB colours to corresponding HSL colours, but only within certain
media sections and selectors. Furthermore, layout/whitespace/comments must
be preserved as much as possible. As with modifying an e-mail on-the-fly,
this cannot be done by first parsing the whole thing, then applying the
transformation as a pure function, then unparsing.

I have a vague idea on a way to deal with this, using some kind of
stateful stream processor. I'll try to code it up some time; maybe it
could be useful for MIME handling as well.


Greetings,

Arie


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Status of MIME Strike Force

2007-06-27 Thread Jeremy Shaw
At Wed, 27 Jun 2007 18:54:58 +0200 (CEST),
Arie Peterson wrote:

 What is the status of the MIME Strike Force?
 
 The goals proposed at
 http://www.haskell.org/haskellwiki/Libraries_and_tools/MIMEStrikeForce
 promise a very useful library. Has the design been initiated?

Currently it is on hold while I work on some other higher priority
projects. But, I do hope to get back to it soon. (Or, perhaps someone
else will have time to work on it).

It seems to me that the mime-string package is very close to what we
need for parsing MIME messages. If all you care about is parsing MIME
messages, I highly recommend it.

This leaves the problem of creating and transforming MIME
messages. The real difficulty in this area is creating a good API. If
you want to take a peek at what I have, take a look at:

http://www.n-heptane.com/nhlab/repos/haskell-mime/APIs.hs

Currently I am difficulty dealing with headers that could appear more
than once.

For example, the 'Keywords' header can appear multiple times. So,
there are two cases to handle:

 1. add an additional Keywords header
 2. deleted any existing Keywords headers and add the new one

Other fields, such as 'Subject', can appear only once.

One way to express a filter that modifies an existing message is the
following:

 exampleHeaders =
( (setHeader (Subject whee)) .
  (setHeader (Subject bork)).
  (addHeader (Keywords [baz, bar, bam])) .
  (addHeader (Keywords [zip, zap, zop]))
)

where setHeader ensures that a header only appears once, and addHeader
appends the header, leaving existing instances alone. The type system
ensures that you can never call addHeader on (Subject
whee). Unfortunately, that code seems a bit verbose.

we can make some helper functions that reduce the verbosity a bit,

 subject = setHeader . Subject
 keywords = addHeader . Keywords

 exampleHeaders3 :: [RawHeader] - [RawHeader]
 exampleHeaders3 =
((subject whee) .
 (subject bork) .
 (keywords [baz, bar, bam]) .
 (keywords [zip, zap, zop]))

That is nice, except that we don't know which headers are going to use
setHeader and which are doing to use addHeader. So, the results might
be a bit suprising. Additionally, keywords always uses addHeader, but
in some cases we might want it to use setHeader.

Another option is to trying to use infix operators, such as:

(.+.) for setHeader 
(.*.) for addHeader

 exampleHeaders2 :: [RawHeader]
 exampleHeaders2 =
((Subject whee) .+. 
 (Subject bork) .+.
 (Keywords [baz, bar, bam]) .*.
 (Keywords [zip, zap, zop]) .*.
 empty
)

This is good because:

 1. is it shorter than the first example that used setHeader/addHeader

 2. the information about whether setHeader/addHeader is being used is
preserved.

 3. it is easy to choose which one you want (setHeader/addHeader)

But, it is also really wonky because the operator has a bit of a
postfix feel to it. For example, it is the .*. at the end of this line
that is making it use addHeader. 

 (Keywords [baz, bar, bam]) .*.

If we wanted to use setHeader here, we would have to change it to:

 (Keywords [baz, bar, bam]) .+.

This behaviour seems pretty unintutive.

A whole other area I have not dealt with yet is data-mining and
filters that depend on the values of existing fields. For example:

 1. find all the headers that contain the string XXX
 2. find all the Keywords fields and merge them into a single Keywords field

So, that is where I currently am. Once the API is worked out, I think
things should progress pretty easily. If you have any ideas, let me
know. I think I may be picking this up again in September or October.

HaXml, SYB, Uniplate, and HList seem like good places to get some ideas. If
anyone has suggestions for other projects to look at, let me know.

j.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Status of MIME Strike Force

2007-06-27 Thread Marc Weber
  exampleHeaders2 :: [RawHeader]
  exampleHeaders2 =
 ((Subject whee) .+. 
  (Subject bork) .+.
  (Keywords [baz, bar, bam]) .*.
  (Keywords [zip, zap, zop]) .*.
  empty
 )
 [...] 
 But, it is also really wonky because the operator has a bit of a
 postfix feel to it. For example, it is the .*. at the end of this line
 that is making it use addHeader. 

So why not use flip on each operator to get
 empty .*. (Subject blah) .+.  ... ?

That might be a little bit more comfortable ?

Marc Weber
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Status of MIME Strike Force

2007-06-27 Thread Jeremy Shaw
At Wed, 27 Jun 2007 21:14:02 +0200,
Marc Weber wrote:
 
   exampleHeaders2 :: [RawHeader]
   exampleHeaders2 =
  ((Subject whee) .+. 
   (Subject bork) .+.
   (Keywords [baz, bar, bam]) .*.
   (Keywords [zip, zap, zop]) .*.
   empty
  )
  [...] 
  But, it is also really wonky because the operator has a bit of a
  postfix feel to it. For example, it is the .*. at the end of this line
  that is making it use addHeader. 
 
 So why not use flip on each operator to get
  empty .*. (Subject blah) .+.  ... ?
 
 That might be a little bit more comfortable ?

Ah, good idea. I'll play with that. Thanks!

j.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Status of MIME Strike Force

2007-06-27 Thread Tony Finch
On Wed, 27 Jun 2007, Jeremy Shaw wrote:

 Currently I am difficulty dealing with headers that could appear more
 than once.

I recommend that you treat the header fields as an ordered list. Do not
use the latitude that the specification gives you to re-order headers, and
do not assume that messages you have to process will be within the minimum
and maximum count requirements for each field. (This rules out encoding
those requirements in the type system.) Postmasters will hate your
software if you do either of these things :-)

You need to support appending new fields to the top as well as the bottom
of the header. Although it's traditional in most situations to add a new
field to the bottom of the header, Received: fields must be added to the
start. For any application that does message processing on an MTA, it's a
*really* good idea to add new fields to the top of the header, so that
they appear interspersed amongst the Received: lines in a way that
indicates where and when the processing happened. Doing this also means
your program will play nicely with DKIM. Ignore the strict syntax in RFC
2822 that prevents arbitrary trace header fields: this is a bug that will
be fixed in the next version of the spec, and in practice software doesn't
mind unexpected trace fields.

Tony.
-- 
f.a.n.finch  [EMAIL PROTECTED]  http://dotat.at/
FAIR ISLE FAEROES: VARIABLE 3 OR 4, BUT NORTHERLY 5 OR 6 IN WEST FAEROES.
MODERATE OR ROUGH. RAIN OR SHOWERS. MODERATE OR GOOD.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe