On Dec 16, 2010, at 2:00 PM, Brice Figureau wrote: > On 16/12/10 22:15, Luke Kanies wrote: >> On Dec 16, 2010, at 5:00 AM, Brice Figureau wrote: >>> [snipped myself] >>> >>> In short: is this a good idea? is there a better solution? >> >> I guess I want to separate the question a bit, between is it a good >> idea to have instrumentation, and is the proposed solution a good >> one? >> >> To the first, I say easily yes - some form of instrumentation in >> Puppet beyond the basic benchmarking of very limited things we have >> would be fantastic. >> >> To the second, I guess there are a few questions I'd like to think >> about before being able to answer it. Feel free to ignore any of >> these if you think they're irrelevant; they're just what popped into >> my head as a means of breaking it down. >> >> * Are there other examples of this problem being solved in the ruby >> community that we can crib off of? It'd be great not to have to >> reinvent the wheel, especially in terms of design. I really wish we >> could just add dtrace probes for it, but that only works on solaris >> and os x. > > To my (extremly limited) knowledge there's none based outside the > ruby-dtrace[1] project.
That's what I feared. >> * Is there a difference between instrumentation to discover the >> problem behind stuck processes, and instrumentation to generally know >> what's going on in Puppet? I.e., people probably would like some >> more understanding of what's slow without having to open a debugger, >> but this probably doesn't require the extra step of threads reporting >> separately. > > My proposal is a dumb and simple system, it's not designed to cover > every use case, but at least should prevent people firing gdb to inspect > stack traces when they look at an apparently stuck process. Sorry, I didn't mean it like that. What I was really trying to figure out was whether this instrumentation system would work well even in cases where there weren't hung processes but people wanted to figure out, for instance, why a given process was just slow, rather than hung. And also, whether we could realistically look at a generalized instrumentation solution, which I think would be great. I agree, though, that it'd be really good to have a simple means for people to figure out why a given process is hung, and if that's all this does, it'd still be worth doing, simple or not. >> * Do you have an idea of how you might implement the methods that do >> the instrumentation? > > OK, maybe I was a too enthusiastic, and what I propose can't really be > qualified as instrumentation. It's more a visual signal to understand > what takes long in a process. Heh. It's at least a good first step. >> * How do you decide where to put the instrumentation? And how >> worried are you about doing so at the subsystem level (e.g., >> networking, parser, indirector) vs. individual classes? > > My overall plan was simply to cover various high-level aspect of the > master (ie parsing of individual files, some part of the compiler, file > serving) and the resource evaluation (ie in the transaction). > This should not be more than 10 different place where I'll add my "probes". Ok. >> I know this is a lot of background, but it's a top-level >> cross-cutting design, and I think it's hard enough that I've thought >> about it for a long time and never delivered on anything. > > Yes, I'm thinking about this since a long time. I failed to find a > global way to solve the problem, so I thought that a small but maybe > useful attempt is way better than nothing :) > > Anyway, even if this gets never merged upstream I'll produce the patch > for those willing to use it (like myself) ;-) > > Now, if we want to think about a more general system, I think I won't > use dtrace. Of course it's elegant because doesn't use any system > resources, but it's not portable enough to cripple our whole codebase > for it. And I don't think accumulating a couple hundreds (if not less) > instrumentation metrics would be such a performance issue. > So one solution would be to set the probes ourselves (like I did in my > example earlier). Each of these probe "blocks" would accumulate CPU > time/memory, call site count (and/or everything we can find from ruby > using something like proc-wait3 or other). Those could be accumulated in > a thread safe array. Then sending a signal to one puppet process would > dump this array to log or a given file, or get this through the status > indirection. Note that blocks in ruby are, um, not terribly performant, so having block-based probes actually could hurt performance (but still are probably worth it). > The complex question as you asked it is where to put the probes :) > I think this could be an ongoing process that we can refine from release > to release, starting with the high-level ones I described. > It's also possible act transversally in the indirector, the network and > the transaction layer with a fully generic system. We would cover almost > every aspect of a puppet process. > > But as with defining metrics the important question to ask ourselves > (and the users) is what would be interesting to measure? What the > question the instrumentation system should answer? > > I should be able to propose a proof-of-concept implementation of the > generic system later if needed. Maybe we could refine the system based > on some real code to discuss on? That would be awesome. -- Do you realize if it weren't for Edison we'd be watching TV by candlelight? -- Al Boliska --------------------------------------------------------------------- Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199 -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
