On Dec 17, 2007 3:38 PM, Levi Pearson <[EMAIL PROTECTED]> wrote: > I've read that page before, too. It's pretty vague and hand-wavy, and > provides no real statistics, but it does give some good information > about how the JVM has improved. Anyway, you made me go look up the > paper I had read before. You can find a link to the pdf/ps file here: > http://citeseer.ist.psu.edu/hertz05quantifying.html . > > The paper was published in 2005, so it's fairly recent, and it was > performed with Java. It is far less hand-wavy, contains real > experimental data, and also explains that things are worse than I had > said earlier. With 3x the heap space, it runs 17% slower than with > explicit memory management. With only 2x the heap space, it runs 70% > slower. When you start interacting with the OS's paging subsystem, > you get order of magnitude performance drops in comparison with > explicit management.
I'm not all the way through the report. But some things already smell funny. As far as I can tell, they are simulating a 2Ghz G4 PowerPC system on a 1Ghz PowerPC physical machine. What? Why? Also they are using ancient GC algorithms which aren't used by Java or .NET. They are running on the Jikes RVM (a research JVM written in Java that runs on an actual Java VM) so the cache hit rate is going to be effected. They aren't using a JIT compiler that could hoist small object values into registers. Worst of all they didn't rewrite the test applications in an explicitly managed language for an honest comparison. They instead just recorded the times that the JVM collected the objects and implanted a call to "free" in their place -- how convenient! I would love to have those runtime calculated decisions known at compile time for a language like C. However, that's not possible unless your mocking out a contrived test like these guys did. So this is completely unrealistic when writing a true explicitly managed application. In a real C app one would have to manually track the lifetime of a struct/array and know when it is safe to free. The most common case is doing much more frequent malloc/free combinations to prevent memory leaks or dangling pointers, etc. At any rate, I think these guys raise some interesting points about the theoretical advantages explicit memory management has, but in practice, it just isn't true. Essentially they are cheating by using a hybrid process that gets the best of both worlds. In most cases, persons writing software in C/C++ will end up sacrificing more time in malloc/free calls, will have less contiguous heap space and thus will hit the L2 cache less often, and will rarely know when to push their objects into registers when possible compared to a GC'd language. However, I will agree that the bit about hitting the OS swap partition is absolutely true. If your language uses a GC and you hit the swap file, it is absolute death to performance. GC has costs, but those costs are paid for (and then some) by having contiguous memory that will likely be found in L2 cache when needed. If you can't find your objects in cache that hurts... If you can't find them in main memory at all... That REALLY hurts because the GC visits the entire heap space on a full collection, thus you have to read ALL of your heap back into main memory if you ever swap. > I'm a big fan of garbage collection, but it does have a significant > cost in the space efficiency of programs. I'd like to see further > research done into other schemes, like region-based memory management, > that reduce the space cost when such an optimization is needed, such > as in embedded systems. Region-based memory management is awesome. That is basically what these guys were mimicking with this test (well, actually even better because they didn't have to do any object tracking whatsoever). Do you use region based techniques in your embedded code? -Bryan /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
