Jesse Guardiani <[EMAIL PROTECTED]> writes: > On Sat, 09 Aug 2003 13:42:06 -0400, Jesse Guardiani > <[EMAIL PROTECTED]> wrote: > > >On 09 Aug 2003 01:33:52 -0500, Tim Legant <[EMAIL PROTECTED]> wrote: > > > >>Jesse Guardiani <[EMAIL PROTECTED]> writes: > >> > >>> I do this: > >>> > >>> # Print headers > >>> print msg_as_string(msgin) > >>> # Print body > >>> print sys.stdin.read() > >> > >>I don't think this does what you think it does. > > > >Are you 100% sure about that? > > Well, even if you're not - I am, now.
Good. I am, too. Here's a pointer to the Python docs for the read() method of file objects. http://www.python.org/doc/2.2.2/lib/bltin-file-objects.html Note that it says "The bytes are returned as a string object." > My test message was running through my secondary MX for some > reason, which caused it to avoid being processed by my script. > > Oops. :-) <grin> It happens. > I did some checking, and in a more fool (that's me) proof test, this > appears to do the job: > > while 1: > data = sys.stdin.read(256) > if data != '': > sys.stdout.write(data) > else: > sys.stdout.flush() > break > > I bet it's a bit slower than a straight copy, but it fits the bill for > me. This is the correct way to reduce memory use. You could even use a bigger buffer, say 8K or so. The outstanding problem with doing this is the filter. There are at least three rules that I can think of off the top of my head that require the entire message body. They are 'body', 'body-file' and 'pipe'. The 'pipe' rule could easily be re-implemented to page the message to the filter program, as in your code above. It's not so easy to do so for the 'body*' rules. The problem is that a regular expression might match the string that is composed of, say, 10 characters at the end of one buffer read and 12 characters at the beginning of the next buffer read. The simple implementation would be to search each buffer, but in my example, the text that should match never would. You would need a more complex algorithm. I know how to do it, but it's a much bigger change than I want to make before 1.0. Finally, if a filter uses more than one of those three rules or any one of them more than once, you'll be paging the message in multiple times, which will undoubtedly be a speed hit. You could avoid this by caching the entire message in memory if a particlar filter required it and then re-using the cached version in any other rules that require it. This gets us right back to where we are today, with the entire body in RAM. It is also more complex than I want to tackle before release 1.0. > I'll do some more testing then post the results. Perhaps this *is* > a "feature" that is better implemented as an option. The speed > tests should let me know... > > Sorry I spoke so soon, and thanks for pointing that out to me Tim! No problem. Just didn't want you thinking you'd found the magic bullet and then end up not seeing any real improvement. Tim _________________________________________________ tmda-workers mailing list ([EMAIL PROTECTED]) http://tmda.net/lists/listinfo/tmda-workers
