On Tue, Jul 27, 2010 at 12:56 PM, Emile Anclin <[email protected]> wrote:
> * Remark about Inference: I once tried to cache the *direct* inferences, > i.e. calling infer() without the context. There, I discovered that: > - only 5 % of the nodes are infered more than one time > - we basicly only infer Name and Getattr nodes directly > - however, there should be a lot of double-infering with regard > to the Context / Callcontext management; I think there are some > low hanging fruits there (like there were discovered some on our > last ASTNG refactoring) for a simple inference cache. Many thanks, Emile, for your comments. I think the essence of the situation, as far as global analysis goes, is that type analysis is "hard" in the sense that it depends on assignments that can happen anywhere in the program. But if none of those assignments change then the type of an object (or rather name) doesn't change. > * Concerning your *Inc-lint*: What is quite obvious is, that Pylint looses > a lot of time of parsing almost always basicly all builtin and all > the Python standard library. Yes, the symbol tables (and deductions) for those files should be cached. > So the question is, how to store all the data about modified and > unmodified files, including a relevant representation of the astng tree? The ast (or astng) trees would be cached merely to discover that special case that the old and new versions of a file differ only by comments or whitespace. This is a cute check, but it's optional. The key to the design is that it is *not* based on diffs to ast trees. Indeed, local analysis will be slightly slower than the corresponding parts of pylint. It does the same general ast-related checks that pylint does, but in addition it is computing (from scratch, not incrementally) both the module symbol table and all the deductions that will have to be satisfied again. > And how reading all this information can be done faster than just astng? It's not faster. It's slower. But unpickling is pretty fast, and it's not much of a concern. The speedup comes becomes the diff phase shows where the old and new versions of the *symbol tables* differ. This in turn allows us to update the *global* list of needed deductions. So yes, there is overhead, but it should be hugely effective overhead. In particular, suppose I am writing new methods of a class. I typically will not change the class attributes, except to add the new methods. These methods might satisfy some global deductions that previously failed, but otherwise *none* of the (very large) set of global deductions will have to be re-verified. So inc-lint will analyze the changed files, but none of the inference machinery will be needed because the symbol tables *already* will contain all the needed information. Or so I hope :-) > At least if we could just store the informations about unmodified modules, > it would probably be a huge improvement, and I wonder if going below > would result in another important improvement: the _ast-parsing of a > single file is quite fast. The problem is that type inference is a completely global matter. Any assignment (including function returns) anywhere in the entire program, can change the set of types that name can take on. There is simply no way around this: global inference can be expensive. Thus, the design must be ambitious enough to handle all possible ripple effects. I believe that the data-flow algorithm prototyped in new-lint is ambitious (and general) enough, but that's not proven. Clearly, though, the data-flow algorithm is the only kind of general algorithm that has a chance of working. It is fundamentally iterative in the sense that deductions can trigger other deductions indefinitely. The trick is to set things up so that a deduction fires only once, when all needed data are available. The new-lint prototype appears to handle these kinds of details properly. Edward ------------------------------------------------------------------------------ Edward K. Ream email: [email protected] Leo: http://webpages.charter.net/edreamleo/front.html ------------------------------------------------------------------------------ _______________________________________________ Python-Projects mailing list [email protected] http://lists.logilab.org/mailman/listinfo/python-projects
