well... I spent several years writing Java in the '90s, and am quite certain that SpamAssassin would perform a *lot* worse if written in Java.
SpamAssassin is heavy on regular expressions, and *very* optimised for Perl's VM. On top of that, I'm pretty sure it would be quite hard to get faster performance out of Java *anyway*. First off, the perl VM uses a much more CISC-like strategy than Java's, with opcodes to implement operations like regexp matches, string modifications, hash lookups, arrays, and so on, in a single VM opcode, implemented in C. That means that those operations in perl will be nearly as fast as the equivalent C (at least if you choose the right operations of course!). Java, OTOH, uses opcodes that are more RISC-like, and implements much of its core library in pure Java -- operations like HashMap lookups or regexp matches, for example -- resulting in quite a few more pure Java ops being required to perform them. (At least this was the situation last time I looked, which admittedly was JDK 1.2 or so ;). Maybe this has changed since then.) For what it's worth, in my experience, Perl's performance is often as fast as anything I could write in any other language -- at least, except for specific, low-level bit-twiddling like the Rabin-Karp fast parallel string matching algorithm I just hacked recently. Perl is a *really* nice language for performance, in my opinion. Java's memory consumption, too, is frankly horrific compared to perl's. Perl's garbage collection, for example, is quite deterministic -- when an object's refcount hits zero, it is immediately freed. Java's, OTOH, relies on occasional GC runs -- and in my experience that can go quite awry resulting in wierd hangs at odd times. Virtually every large Java project I've worked on has had the odd invocation of System.gc(); thrown in odd places because of this! This bug has been a problem in java since 1.0, and talking to java hackers recently, they still complain about it in current releases. so, in conclusion: go perl. ;) --j. Matt Kettler writes: > Eric A. Hall wrote: > > Thinking about the GPL Java announcement some, and trying to imagine the > > kinds of opportunities this allows for, it occurs to me that SpamAssassin > > might be a natural fit for Java. > > > > I'm just thinking out loud here, not advocating anything... > > > > Would it run better? Would it be faster, have smaller memory footprint, > > better reclamation, better hooks for plugins etc? OTOH, would it be harder > > to build, given the dependence of SA on perl modules? > > > There's been about a 3 dozen other folks who have asked about porting SA > to C/C++/Java/Python/<Insert any other language here>. > > In general, SA would suffer severely from a conversion to Java, or any > other language. > > It all fundamentally boils down to two things: > > 1) perl has a substantial base of text parsing and utility libraries > that no other language can match.. Java does have native regex support, > so it has a leg up over the others, but it still lacks many of the > libraries that SA is so heavily entrenched in. Do you know of any > equivalent to IP::Country::Fast, for *ANY* other language? Admittedly > that one is not used by everyone, but the MIME parsers, base64 decoders, > HTML parser, Net::DNS, etc would be tough to find good matches for > without having to write/maintain your own. This kind of text > manipulation is what perl is actually very good at, and has lots of > support libraries for. > > 2) Most importantly, consider that all of the existing devels that > maintain the code are perl developers, and not all of them are Java > developers. Poof, there goes at least some, if not all, of your > development team down the tubes. This is by far the most significant > hurdle. Who would we loose here, and can we afford to loose the > spam-fighting expertise these people have? > > That said, I'm a C/C++/assembly developer myself, and my own personal > reaction is "why would you want to convert from one lumbering hulk of a > language with an expensive interpreter to another lumbering hulk of a > language with an expensive VM." And yes, I know java is "JIT compiled" > not interpreted, but AFAIK this is not as different from how perl works > as you might think. Perl code isn't strictly interpreted from scratch > every time you pass through the same code. Perl is really compiled and > optimized at load time into bytecode, then interpreted from that. This > makes perls startup much slower, but runtime isn't as slow as an > interpreted language. As for size, perl interpreters and java VMs are > both large. > > And yes, you can native compile java to machine code, but I doubt your > gains here will be significant. > > My bets are on SA spending 99% of it's time in regex evaluation or > network lookups. Regex execution is VERY well optimized in both > languages even without native compilation, so that won't be helped much, > if at all. Network lookups are basically spending their time waiting.. > you can't wait any faster in machine code than a semi-interpreted > application. > > I also expect a lot of the memory usage is the annotation tables and > such for regexes. It would be interesting to compare the size of spamd > without any rules loaded against one with a stock ruleset. The > difference between the two can't really be improved by any means other > than using a slower regex interpreter that doesn't use tables as > extensively.