Just now, Danny Yoo wrote: > > But I don't know how to interpret the results I'm getting back from > the profiler yet. I guess I'm trying to say: I'm illiterate: I > don't know how to read the textual output of the profiler, and need > some help.
The output is a conventional one. (It's not completely compatible with gprof (I've based it on the output from Allegro Lisp), but it's close enough that it woudn't be hard to adapt it.) > Total cpu time observed: 26160ms (out of 26538ms) > Number of samples taken: 418 (once every 63ms) > (Hiding functions with self<1.0% and local<2.0%: 16 of 369 hidden) This part should be obvious. One potential problem is that you have a very small number of samples -- it would help to just run your code mutiple times, or have more frequent sampling, so you get more data. > [...] It prints long lines, which your emailer butchered (and I tried to fix below). > map [120] 16.0% > loop [126] 84.0% > [2] 1576(6.0%) 0(0.0%) parse-def-values > ...parse-bytecode-5.1.2.rkt:273:3 > parse-lam [8] 80.7% > parse-case-lam [10] 9.6% > parse-let-void [14] 6.3% > parse-with-cont-mark [154] 3.3% So, this entry gives you information about `parse-def-values', which is defined in that file, at 273:3. (The "...s" are needed because without them the lines would turn from inconveniently long to problematically long; more below.) Here are some random bits that should explain the rest: * It was running for 1576ms, which is 6.0% of the overall runtime. (Remember that "it was running" is an estimate, which is actually "I observed an active stack frame for this call N times which sum up to that time"). * It spent 0ms in its own code, which is 0.0% of its running time. In this case, 0 means that this function was never at the top of the stack, which means that it's very unlikely to be a bottleneck. To cut runtime costs, you'd be looking for functions with high such self times. But the problem is that this is not a hard rule -- because it might be that it called some other measured function which you can't optimize, so you'd need to reduce the number of calls you make for it... * Overall, `map' was the direct caller for this function for 16% of its runtime, and `loop' was the caller for the other 84%. * Obviously, `loop' is not a good enough name, but the "[126]" on its right is an identifying index into the results -- if you search for it, you'll see its entry with its source location (if it has one). Note this function (`parse-def-values') has a "[2]" label that identifies it for such occurrences elsewhere. This is even more obviously useful for functions with no name. * Overall, in 80.7% of this runtime the callee that this function invoked was `parse-lam' -- so if this function is a bottleneck, you should look there for what should be optimized. > compile-splice [3] 24.3% > compile [141] 75.7% > [3] 598(2.3%) 48(0.2%) compile-splice > ...song/compiler/compiler.rkt:570:0 > compile-def-values [9] 75.0% > compile-splice [3] 24.3% This is an example for a function that did significant work in its own body. The callers part don't sum up to 100% since it called itself a bunch of times and the same goes for the callee part. > I'll cut off the output here, because there's pages and pages of > this. I don't know what I should be looking at. Are the nodes > being printed in any particular order? I think that they're printed in a topological order, but I'm not sure. There are some obvious tradeoffs here: if its sorted by the self runtime (or any other measure) you get results that makes it easy to find the function that spent the most self runtime. But it's very rarely that you'd want to do something in the function itself, you'd more frequently want to deal with its callers and callees, and with such a sort you'd end up jumping up and down the page to figure out the flow. Doing a topological sort makes that kind of stuff easier, and I think that this is the right choice, since the numbers are easier to glance and see "interesting" areas, unlike the topological sort. If you have ideas on how to improve the docs, feel free to suggest patches... > What else can I do to make the output sensical? The fact that the > function names are getting cut off with the '...'s is unhelpful. Yeah, that's a problem too -- as I said above. There are obvious limitations for this kind of output, so one thing that you may want to try is the graphviz renderer. That would be limited too, and I think that a real improvement would come in some interactive renderer -- either some GUI thing or some HTML thing: * A gui thing would be great, since information on source location and generally more descriptive information can appear in popups etc. I started on one once, and my intention was to do something like the graphviz layout, and that turned out to be (surprise) pretty hard to do. I can probably send the code to anyone who's interested in continuing it. Another option is to just let graphviz do the layout, making it accessible only if it's present. BTW, the layout is important since without it you get an indecipherable pile of boxes that you need to sort out to see how things go -- and you need to do that after every optimization iteration which makes it impractical. (Also, doing some naive layout like the module browser won't work, since you get much longer chains, and many of them side by side.) * An HTML thing could start with the graphviz HTML mode which generates a picture and a corresponding image map. The rest of the information could be done in a similar way to the GUI thing -- with popups etc. The problem with that approach is that it's sometimes useful to reorganize things as you sort things out, and this wouldn't be available here without making up a solution that is equivalent to the gui (ie, let graphviz do the layout, write the rest in JS). * Finally, another option that is good to explore is to see if there's some good visualization tool that work with gprof output or some other output (I'm sure there are), then write a renderer that produces that output. IIRC, the actual gprof output is limited in that it hides some of the details, and I think that I saw some output format that is designed to provide more information in a machine parsable way. BTW, it's easy to write such renderers, and it's even easier to write some trivial renderer that dumps the call chains that it observed -- in case you know of a tool that takes that as its input. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _________________________________________________ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/users