Re: HeaderParser loads email body into RAM

Jesse Guardiani Sun, 10 Aug 2003 18:41:20 -0700

On 09 Aug 2003 01:33:52 -0500, Tim Legant <[EMAIL PROTECTED]> wrote:

>Jesse Guardiani <[EMAIL PROTECTED]> writes:
>
>> And figure out how to print the body of the message to disk/stdout without
>> using the email module. Instead of this:
>> 
>>   print msg_as_string(msgin)
>> 
>> I do this:
>> 
>>   # Print headers
>>   print msg_as_string(msgin)
>>   # Print body
>>   print sys.stdin.read()
>
>I don't think this does what you think it does.


Are you 100% sure about that?


>  Your code may be a
>little bit faster than Util.msg_as_string(), since it doesn't have to
>create a HeaderGenerator and use that to produce the message as a
>string, but it's essentially the same thing that TMDA does; i.e., it
>reads the entire body of the message into a Python string variable.
>
>When TMDA uses HeaderParser, that string variable is stored as the
>message's payload; in your case, it's a temporary (unnamed) variable.
>Even though you immediately write it to stdout, Python has to allocate
>the full amount of memory necessary to read the entire message.
>That's the definition of the read() method of file objects.

You didn't site any reference material to back up that claim. If you
do indeed have proof of this, I'd like to take a good look at it. (I'd
look at the Python source code, but I don't think I'd know where
to begin...)

My tests with Python 2.2.2 on FreeBSD 4.8-RELEASE indicate
exactly the opposite: I am currently running the "fixed" version of
my script on my production SMTP server. I have set softlimit to 10
 Megs for my smtp connections. Earlier versions of my script without
the "fix" would throw out of memory errors if I sent a message even
half as large as the softlimit - various daemon sizes. However, I just
sent a 22.5 Meg test message through my production SMTP server
and it arrived in my inbox.

It appears to indeed be doing what I think it does.


>
>HeaderGenerator (used by TMDA) does just a little more than your code
>does.  The header writing is identical; the body write looks like
>this:
>
>        payload = msg.get_payload()
>        if payload is None:
>            return
>        if not _isstring(payload):
>            raise TypeError, 'string payload expected: %s' % type(payload)
>        if self._mangle_from_:
>            payload = fcre.sub('>From ', payload)
>        self._fp.write(payload)
>
>There are a couple of quick tests for missing or non-string payloads,
>a regular expression substition to escape lines that start with 'From'
>(this only happens when writing to mbox files, in TMDA's case) and
>then the entire string payload is written to the file at once.
>
>If your benchmark ends up showing a big difference in speed between
>those two examples it's a good indication that there's something wrong
>with your benchmark.

Well, if you can prove to me that my test message isn't really 22.5
Megs then I might believe you. :-)

Perhaps I should provide my script to the tmda development list for
open testing? I'd have to clear that with my manager first, but if it
would clear up some skepticism I think we might be willing to do it.


>
>TMDA uses more RAM because it has two copies of the message in memory
>(because of the StringIO object).  That object, stdin, probably isn't
>necessary anymore, since we switched to using HeaderParser.  It came
>into existence when we first started using the email package and were
>using the full Parser.  It isn't referenced at all in tmda-rfilter,
>except when we first read the message from it.  In my opinion it can
>be removed.  That would help with RAM usage (the entire message would
>still be in memory, just not twice), but it won't make any meaningful
>difference in speed.
>
>
>Tim

--
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net


_________________________________________________
tmda-workers mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-workers

Re: HeaderParser loads email body into RAM

Reply via email to