On Mon, Feb 9, 2009 at 1:11 PM, chromatic <[email protected]> wrote: > I wanted to measure the overhead of the calls to the FileHandle PMC's puts() > method, so I changed the relevant ops (print_p_i and print_p_sc) to call > Parrot_io_write_buffer() directly. The benchmark runs over five times faster.
On a tangential matter, maybe we need to consider changing "puts" to be a VTABLE instead of a METHOD? It's a common-enough operation, and the immediate performance win would be large in these cases. Not only can we use it for file handles and sockets, but we could add "line buffered" puts to things like string arrays, and I'm sure HLLs are going to be writing their own subclasses too. Instead, maybe we could use the push_string VTABLE instead, to serve the same purpose. Of course, this does skip the root of the problem, that METHODs are inherently slow and simply avoiding them (or converting them into an ever-growing list of VTABLEs) is not a solution. which brings me to.... > The culprit, as usual, is that converting between C calling conventions and > Parrot's calling conventions is slow. The unmodified benchmark generates > 3,049,533 new PMCs. The modified benchmark generates 2,655 new PMCs. For > reference, "Hello, world!" in PIR generates 1,454 new PMCs. I still can't really understand where all these PMCs are coming from. It seems like an unbelievably huge number for such a simple benchmark. I'm sure there are a few places where we could be doing in situ PMC reuse instead of allocating them fresh and hoping the GC will deal with the wreckage. There are a few places where we could be avoiding creating PMCs entirely, maybe using some more primative structures to manage small amounts of data if needed. Some things that we do calculate immediately can be avoided, for instance type tuple PMCs can be avoided unless we are calling a multi. Looking at the generated C code for the puts method, I can immediately see a few things that are suspect, which brings a lot of questions to mind: Why are we generating a _params_sig PMC here it seems to serve no purpose whatsoever? Why are we creating a _returns_sig too? Don't we have the signature for the method already generated as part of Parrot_PCCINVOKE? It doesn't look like the _params_sig PMC is actually being used anywhere either. Also, why do we create a RetContinuation PMC here, what is it's purpose? Parrot_PCCINVOKE, and variants, create a new context for the method when it's called, although NCI.pmc:invoke() doesn't. So in some cases we appear to be creating two contexts for a method call instead of just one. > (The problem isn't in the garbage collector, however. The GC runs 454 times > for the modified benchmark and 2793 times for the unmodified benchmark -- only > 6.5 times more for the slow benchmark, despite there being 2500 times more > garbage to collect. This benchmark is actually really easy on the GC, because > it's a tight loop where almost everything is garbage -- walking the anchored > set is easy because it's *tiny*.) It is refreshing to hear that the GC isn't the source of the problem, although I'm sure there is room for improvement here too. Even though it's got a small set of PMCs to mark, it still has a huge set of PMC to sweep. Each one needs to be touched and examined too. If most PMCs are very short-term garbage, there is a case to be made to implement a semi-space collector or an aggressive compacting collector instead of an incremental MS like we've been looking at. Anything we could do to kill large contiguous batches of dead PMCs in a single operation would be better then sweeping through the arenas and killing them off one-by-one. Lots of areas here for improvement, we just need to decide how to fix the problems and what to tackle first. --Andrew Whitworth _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
