Re: HeaderParser loads email body into RAM

Tim Legant Thu, 14 Aug 2003 12:40:42 -0700

Jesse Guardiani <[EMAIL PROTECTED]> writes:

> And figure out how to print the body of the message to disk/stdout without
> using the email module. Instead of this:
> 
>   print msg_as_string(msgin)
> 
> I do this:
> 
>   # Print headers
>   print msg_as_string(msgin)
>   # Print body
>   print sys.stdin.read()


I don't think this does what you think it does.  Your code may be a
little bit faster than Util.msg_as_string(), since it doesn't have to
create a HeaderGenerator and use that to produce the message as a
string, but it's essentially the same thing that TMDA does; i.e., it
reads the entire body of the message into a Python string variable.

When TMDA uses HeaderParser, that string variable is stored as the
message's payload; in your case, it's a temporary (unnamed) variable.
Even though you immediately write it to stdout, Python has to allocate
the full amount of memory necessary to read the entire message.
That's the definition of the read() method of file objects.

HeaderGenerator (used by TMDA) does just a little more than your code
does.  The header writing is identical; the body write looks like
this:

        payload = msg.get_payload()
        if payload is None:
            return
        if not _isstring(payload):
            raise TypeError, 'string payload expected: %s' % type(payload)
        if self._mangle_from_:
            payload = fcre.sub('>From ', payload)
        self._fp.write(payload)

There are a couple of quick tests for missing or non-string payloads,
a regular expression substition to escape lines that start with 'From'
(this only happens when writing to mbox files, in TMDA's case) and
then the entire string payload is written to the file at once.

If your benchmark ends up showing a big difference in speed between
those two examples it's a good indication that there's something wrong
with your benchmark.

TMDA uses more RAM because it has two copies of the message in memory
(because of the StringIO object).  That object, stdin, probably isn't
necessary anymore, since we switched to using HeaderParser.  It came
into existence when we first started using the email package and were
using the full Parser.  It isn't referenced at all in tmda-rfilter,
except when we first read the message from it.  In my opinion it can
be removed.  That would help with RAM usage (the entire message would
still be in memory, just not twice), but it won't make any meaningful
difference in speed.


Tim

_________________________________________________
tmda-workers mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-workers

Re: HeaderParser loads email body into RAM

Reply via email to